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MEL Q KRG+F+ M Y+D NRV++IY +PIAEIV++FFD+LKSST+GYASFDY++ Y+ 
Sbjct: 429 MELCQGKRGNFIDMQYLDMJRVSIIYDMPLAEIVYEFFDQLKSSTKGYASFDYELIGYKP 488 

Query: 488 SQLVECMDILLNGDKVDALSFIVHKEFAYERGKIIVEKLKKIIPRQQFEVPIQAAIGQKIV S47 

S+LVKMDI+LNG+K+DALSFIVH+++AYERGK+IVEKLK++IPRQQFEVP+QAAIGQKIV 
Sbjct: 489 SKLVKMDIMLNGEKIDALSFIVHRDYAYERGKVIVEICLKELIPRQQFEVPVQAAIGQKIV 548 

Query: 548 ARSDIKALRKlWIiAKCYGGDVSRKRKLLEKQKAGKKRMKAIGSVEVPQEAFLSVLSMDDD 607 
ARS IKA+RKNVIAKCYGGD+SRKRKLLEKQK GK+RMK +GSVEVPQEAF++VL MDD 

Query: 608 TKK 610 
KK 

Sbjct: 609 PKK 611 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 587/610 (96%) , Positives = 601/610 (98%) 
Query: 1 IffllEDLKKKQEKIRNFSIIAHIDHGKSTLADRILEKTETVSSREMQAQLLDSMDLERERG 60 



Query: 61 ITIKimiELNYTAKDGETYIFHLIDTPGHVDFTYEVSRSLAA.CEGAILVVDAAQGIEAQ 120 

ITIKL^AIEtNYTAKDGETYIFHLIDTPGHVDFTYEVSRSLAACEGAILVVDAAQGIEAQ 
Sbjct: 61 ITIKMAIELNYTAKDGETY1FHLIDTPGHVDFTYEVSRSIAACEGAILVVDAAQGIEAQ 120 

Query: 121 TLANVYLALDNDLEILPVINKIDLPAADPERVRAEVEDVIGLDASEAVLASAKAGIGIEE 180 

TIANVYLALDNDLEILPVINKIDLPAADP3RVR EVEDVIGLDASEAVLASAKAGIGIEE 
Sbjct: 121 TIANWLALDIvIDLEILPVINKIDLPAADPERVRHF/VEDVIGLDASEAVIASAKAGIGIEE 180 

Query: 181 ILEQIVEKVPAPTGEVDAPLQALIFDSVYDAYRGVILQVRIVNGMVKPGDKIQMMSNGKT 240 

ILEQIVEKVPAPTG+VDAPLQALIFDSVYDAYRGVILQVRIWG+VKPGDKIQMMSNGKT 
Sbjct: 181 ILEQIVEKVPAPTGDVDAPLQALIFDSVYDAYRGVILQVRIVNGIVKPGDKIQMMSNGKT 240 

Query: 241 FDVTEVGIFTPKAVGRDFLATGDVGYIAASIKTVADTRVGDTITLANNPAIEPLHGYKQM 300 

FDVTEVGIFTPKAVGRDFLATGDVGY+AASIKTVADTRVGDT+TLANNPA E LHGYKQM 
Sbjct: 241 FDVTEVGI FTPKAVGRDFLATGDVGYVAAS I KTVADTRVGDTVTLANNPAKEALHGYKQM 300 

Query: 301 NPMVFAGLYPIESNKYNDLREALEKLQLNDASLQFEPETSQALGFGFRCGFLGLLHMDVI 360 

NPMVFAG+YPIESNKYNDLREALEKLQLNDASLQFEPETSQALGFGFRCGFLGLLHMDVI 
Sbjct: 301 NPMVFAGIYPIESNKYOTDLREALEKLQLJIDASLQFEPETSQALGFGFRCGFLGLLHMDVI 360 

Query: 361 QERLEREFNIDLIMTAPSWYHVNTTDGEMLEVSNPSEFPDPTRVDSIEEPYVKAQIMVP 420 

QERLEREFNIDLIMTAPSWYHV+TTD +M+EVSNPSEFPDPTRV IEEPYVKAQIMVP • 
Sbjct: 361 QERLEREFNIDLIMTAPSWYHVHTTDEDMIEVSNPSEFPDPTRVAFIEEPYVKAQIMVP 420 

Query: 421 QEWGAVMEIAQRKRGDFVTOTYIDDHRWWIYQIPLAEIVFDFFDKLKSSTRGYASFDY 480 

QEFVGAVMEL+QRKRGDFVTMDYIDDNRVNVIYQIPLAEIVFDFFDKLKSSTRGYASFDY 
Sbjct: 421 QEWGAVMELSQRKRGDFTCMDYIDD^VNVIYQIPLAEIVFDFFDKLKSSTRGYASFDY 480 

Query: 481 EISEYRRSQLXKMDILLNGDKVDALSFIVHKEFAYERGKLIVDKLKKIIPRQQFEVPIQA 540 

++SEYRRSQL KMDILLNGDKVDALSFIVHKEFAYERGK+IV+ICLKKIIPRQQFEVPIQA 
Sbjct: 481 DMSEYRRSQLVKMDILLNGDKVDALSFIVHKEFAYERGKIIVEKLKKIIPRQQFEVPIQA 540 

Query: 541 AIGQKIVARSDItCALRKMVIAKCYGGDVSRKRKLLEKQICAGKKRMJCAIGSVEVPQEAFLS 600 

AIGQKIVARSDIICALRKNVLAKCYGGDVSRKRKLLEKQKAGKKRMKAIGSVEVPQEAFLS 
Sbjct: 541 AIGQKIVARSDIKALRKNVIAKCYGGDVSRKRKLLEKQKAGKKRMICAIGSVEVPQEAFLS 600 

Query. 601 VLSMDDDDKK 610 

VLSMDDD KK 
Sbjct: 601 VLSMDDDTKK 610 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 
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Example 1370 

A DNA sequence (GBSxl455) was identified in S.agalactiae <SEQ ID 4189> which encodes the amino 
acid sequence <SEQ ID 4190>. This protein is predicted to be awd gene product (ndk). Analysis of this 
protein sequence reveals the following: 

N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2097 (Affirmative) < suco 

10 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAF57188 GB:AE003779 awd gene product [Drosophila melanogaster] 
15 Identities = 73/136 (53%) , Positives = 100/136 (72%) , Gaps = 5/136 (3%) 

Query: 2 EQTFFMIKPDGVKRGFIGEVISRIERRGFSIDRLETOYADADILKRHYAELTDRPFFPTL 61 

E+TF M+KPDGV+RG +G++I R E++GF + L+ +A ++L++HYA+L+ RPFFP L 
Sbjct: 25 ERTFIMVKPDGVQRGLVGKIIERFEQKGFKLVALKFTWASKELLEKHYADLSARPFFPGL 84 

20 

Query: 62 VDYMTSGPVIIGVISGEEVISTWRTMMGSTNPKDALPGTIRGDFAQAPSPNQATCNIVHG 121 

V+YM SGPV+ V G V+ T R M+G+TNP D+LPGTIRGDF Q NI+HG 

Sbjct: 85 VNYMNSGPWPMVWEGLNVVKTGRQMLGATOPADSLPGTIRGDFC 1QVGRNIIHG 139 

25 . Query: 122 SDSPESATREIAIWFN 137 
SD+ ESA +EIA+WFN 
Sbjct:' 140 SDAVESAEKEIALWFN 155 

A related DNA sequence was identified in S.pyogenes <SEQ ID 4191> which encodes the amino acid 
30 sequence <SEQ ID 4192>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 

Final Results 

35 bacterial cytoplasm Certainty=0 .2913 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

40 Identities = 30/48 (62%), Positives = 35/48 (72%) 

Query: 87 MMGSTNPKDALPGT I RGDFAQAPS PNQATCNI VHGSDSPESATRE IAI 134 

MM TNPKDAL GTIR +FAQAP + N+VHGS S +SA REIA+ 
Sbjct: 1 MMRVTNPKDALCGTIRENFAQAPC-DDGGIFNMVHGSHSRDSARREIAL 48 

45 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1371 

A DNA sequence (GBSxl456) was identified in S.agalactiae <SEQ ID 4193> which encodes the amino 
50 acid sequence <SEQ ID 4194>. Analysis of this protein sequence reveals the following: 



■ Final Results 

bacterial cytoplasm Certainty=0. 2734 (Affirmative; 

bacterial membrane Certainty=0 . 0000 (Not Clear) . 
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bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
A related DNA sequence was identified in S.pyogenes <SEQ ID 4195> which encodes the amino acid 
5 sequence <SEQ ID 41 96>. Analysis of this protein sequence reveals the following: 

Possible site: 16 

>» Seems to have no N-terminal signal sequence 

Final Results 

10 bacterial cytoplasm Certainty=0 . 1985 (Affirmative) = suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

15 Identities = 22/34 (64%) , Positives = 26/34 (75%) 

Query: 28 SFGTIRNSTALKQLTLDSLNLLSFGTIRNSTALK 61 . . 

SFGTI+NS ALKQ 4- 4-N SFGTI+NS ALK 
Sbjct: 7 SFGTIQNSIALKQKAQEEINQRSFGTIQNSIALK 40 
20 Identities = 22/34 (64%) , Positives = 26/34 (75%) 

Query: 6 S FGT I RNSTAL KLYAKQS PAFRS FGT I RNSTALK 39 

SFGTI+NS ALK A++ RSFGTI+NS ALK 
Sbjct: 7 S FGT I QNS I ALKQKAQEE INQRSFGTI QNSI ALK 4 0 

25 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1372 

A DNA sequence (GBSxl457) was identified in S.agalactiae <SEQ ID 4197> which encodes the amino 
30 acid sequence <SEQ ID 4198>. Analysis of this protein sequence reveals the following: 
Possible site: 16 

»> Seems to have no N-terminal signal sequence 

Final Results 

35 bacterial cytoplasm Certainty=0 . 1407 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

40 A related DNA sequence was identified in S.pyogenes <SEQ ID 4199> which encodes the amino acid 
sequence <SEQ ID 4200>. Analysis of this protein sequence reveals the following: 

Possible site: 37 

»> Seems to have no N-terminal signal sequence 

45 Final Results 

bacterial cytoplasm Certainty=0. 2055 (Affirmative) < suco 

bacterial membrane Certainty=0.0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

50 An alignment of the GAS and GBS proteins is shown below. 

Identities = 154/221 (69%) , Positives = 187/221 (83%) 



Query: 1 MIKINFPILDEPLVLSNATILTIEDVSVYSSLVKHFYQYDVDEHLKLFDDKQKSLKATEL 60 
++ +NF +LDEP+ L TIL +EDV V+S +V++ YQY+ D LK FD K K++K +E+ 
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Sbjct: 8 LMNLNFSLLDEPIPLRGGTILVLSDVCVFSKIVQYCyQYEEDSELXFFDHKMKTIKESEI 67 

Query: 61 MLVTD I LGYDVNSAP I LKL IHGDLENQFNEKPEVKSMVEKLAATITELIAFECLENELDL 120 

MLVTD I LG+DVNS + ILKLIH DLE+QFNEKPEVKSM++KL ATITELI FECLENELDL 
Sbjct: 68 MLVTDILGFDVNESTILKLIHADLESQFNEKPEVKSMIDKLVATITELIVFECLENELDL 127 

Query: 121 E YDE I KI LELIKALGVKI ETQSDTI FEKCFE I IQVYHYLTKKNLLVFVNSGAYLTKDEVI 180 

EYDEI ILELIK+LGVK+ETQSDTIFEKC EI+Q++ YLTKK LL+FVNSGA+LTKDEV 
Sbjct: 128 EYDEITILELIKSLGWVETQSDTIFEKCLEILQIFKYLTKKKLLIFVNSGAFLTKDEVA 187 

Query: 181 KLCEYINLMQKSVLFLEPRRLYDLPQYVIDKDYFDIGENMV 221 

L EYI+L +VLFLEPR LYD PQY++D+DYFLI +NMV 
Sbjct: 188 SLQEYISLTNLTVLFLEPRELYDFPQYILDEDYFLITKNMV 22B 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

el373 

A DNA sequence (GBSxl458) was identified in S.agalactiae <SEQ ID 4201> which encodes the amino 
acid sequence <SEQ ID 4202>. Analysis of this protein sequence reveals the following: 

N- terminal signal sequence 

Final Results 

bacterial cytoplasm --- Certainty=0 . 0842 (Affirmative) < suco 

25 bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=.0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9783> which encodes amino acid sequence <SEQ ID 9784> 
was also identified. 

30 The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB83918 GB:AL162753 hypothetical protein NMA0629 [Neisseria 
meningitidis Z2491] 
Identities = 45/104 (43%) , Positives = 65/104 (62%) , Gaps = 2/104 (1%) 

35 Query: 4 RYMRMILMFDMPTETAEERKAYRKFRKFLLSEGFIMHQFSVYSKLLLNNTANNAMIGRLK 63 

++MR+I+ FD+P TA +RKA +FR+FLL +G+ M Q SVYS+++ + RL 
Sbjct: 5 KFMRIIVFFDLPVITAAKRKAANQFRQFLLKDC-YQMLQLSVYSRIVKGRDSLQKHHNRLC 64 

Query: 64 VNNPKKGNITLLTVTEKQFARMVYLHGERNT--SVANSDSRLVF 105 
40 N P++G+I L +TEKQ+A M L GE T NSD L+F 

Sbjct: 65 ANLPQEGSIRCLEITEKQYAAMKLLLGELKTQEKKVNSOQLLLF 108 

A related DNA sequence was identified in S.pyogenes <SEQ ID 4203> which encodes the amino acid 
sequence <SEQ ID 4204>. Analysis of this protein sequence reveals the following: 

45 Possible site: 18 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 0822 (Affirmative) < suco 

50 bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Hot Clear) < suco 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 97/112 (86%), Positives = 107/112 (94%) 
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Sbjct: 1 MSYRYMRMILMFDMPTDTAEERKATOKFRKFLLSEGFIKHQFSIYSIO^LLNm'AMNAMIG 60 

Query: 61 RLKOTNPKKGNITLLTVTEKQFARMVYLHGERNTSVANSDSRLVFLGDSYDQ 112 

RL+ +NP KGNITLLTVTEKQFARM+YLHGERN +ANSD RLVFLG+++D+ 
Sbjct: 61 RLREHWPNKGNITLLTVTEKQFARM1YLHGERNNCIANSDERLVFLGEAFDE 112 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1374 

A DNA sequence (GBSxl459) was identified in S.agalactiae <SEQ ID 4205> which encodes the amino 
acid sequence <SEQ ID 4206>. Analysis of this protein sequence reveals the following: 

3 N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3185 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Mot Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB83919 GB-.AL162753 hypothetical protein NMA0630 [Neisseria 
meningitidis Z2491] 
Identities = 71/224 (31%) , Positives = 122/224 (53%) 

Query: 4 WRTVVVOTHSKLSYKN^LIFKDSYQTEMIHLSEIDILIMETTDIVLSTMLIKRLVDENI 63 

WR++++ KLS + L+ + + ++ + L +1 ++I+E + +++ L+ L + 
Sbjct: 3 WRSLLIQNGGKLSLQRRQLLICQNGESHTVPLEDIAVII1EKRETLITAPLLSALAEHGA 62 



Query: 124 CSFFEKSQSIMNLYHDLEPFDPSNREGHAARIYFOTLFGNDFSREQDNPINAGLDYGYSL 183 



Query: 184 LIiSMFAREWKCGCMTQFGrjKHANQFNQFNLASDIKEPFRPIVD 227 

L+AR+ G+ GLH++N FNLA D +EP RP+ D 
Sbjct: 183 LRAAVARALTLYGWLPALGLFHRSELNPFNLADDFIEPLRPLAD 226 

A related DNA sequence was identified in S.pyogenes <SEQ ID 4207> which encodes the amino acid 
sequence <SEQ ID 4208>. Analysis of this protein sequence reveals the following: 



• Final Results 

bacterial cytoplasm Certainty=0 .3185 (Affirmative) < s 

bacterial membrane Certainty=0 . 0000 (Not Clear) < sue 

bacterial outside Certainty=0. 0000 (Not Clear) < sue 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 239/289 (82%), Positives = 271/289 (93%) 

Query: 1 MAGWRTWVNTHSKLSYKNNHLIFKDSYQTEM 60 

mGWTVV\MHSKlSYKl^LlFKD+Y+TE+IHLSEIDIL++ElTDIVLSTML+KRLVD 
Sbjct: 1 mGWRTVVVNTHSKLSYKNNHLIFKDAYKTELIHLSEIDILLLETTDIVLSTMLvraLVI) 60 



Query: 61 ENILVIFCDDKRLPTAMLMPYYARHDSSLQLSRQMSWIEDVKADVWTSIIAQKILNQSFY 120 
EN+LVTFCDDKRLPTAMLMP+Y RHDSSLQL +QMSW E VK+ VWT+IIAQKILNQS Y 
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ENVLVI FCDDKRLPTAMLMPFYGRHDS SLQLGKQMSWSETVXSQ VWTTI IAQKILNQSCY 120 



Query: 181 YSLLLSMFJ^EWKCGCMTQFGLKHANQFNQFKLA3DIKEP?RPIVDRIIYENRQSDFVK 240 

Y+LLLSMFAREW GCMTQFGLKHANQFNQFN ASDIMEPFRP+VD+I+YENR F K 
Sbjct: 181 YTLLLSMFAREVWSGCMTQFGLKHANG'FNQFNFASDIKEPFRPLVDKIVYENRNQPFPK 240 

10 

Query: 241 MKRELFSMFSETYSYNGKEMYLSNI VSDYTKKVIKSIMSDGNGIPEFRI 289 

+KRELF++FS+T+SYNGKEMYL+NI+SDYTKKV+K+LN++G G+PEFRI 
Sbjct: 241 IKRELFTLFSDTFSYNGKEfCYLTNIISDYTKKWKAU.IKEGKGVPEFRI 289 

15 Based on this analysis, it was predicted that these proteins and their epitopes could he useful antigens for 
vaccines or diagnostics. 

Example 1375 

A DNA sequence (GBSxl460) was identified in S.agalactiae <SEQ ID 4209> which encodes the amino 
acid sequence <SEQ ID 4210>. Analysis of this protein sequence reveals the following: 

3 N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 1109 (Affirmative) < suco 

bacterial membrane Certainty-0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

■ >GP:CAB73943 GB:AL139078 hyopthetical protein Cj 1523c [Campylobacter 
jejuni] 

Identities = 165/746 (22%) , Positives = 291/746 (38%) , Gaps = 115/746 (15%) 
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Sbjct: 
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Sb j ct : 




Query: 


769 


Sbjct: 


485 


Query: 


821 


Sbjct: 


532 


Query: 


878 


Sbjct: 


591 


Query: 


936 


Sbjct: 


647 




987 


Sbjct: 


706 



-1505- 

■-PVVLI^IKEYRKVENaLLKKYG-KVHKINIELAREV 484 



Y+GE + I +L +IDHI P + DDS N+VLV + +N+ K 4 



++ F R h +TR I 4 



A related DNA sequence was identified in S. pyogenes <SEQ ID 421 1> which encodes the amino acid 
sequence <SEQ ID 4212>. Analysis of this protein sequence reveals the following: 

d N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 0973 (Affirmative) < suco 

bacterial membrane — Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities - 881/1380 (63%), Positives = 1088/1380 (78%), Gaps = 22/1380 (1%) 

Query: 1 MNKPYSIGLDIGTNSVGWSIITDDYKVPAKKMRVLGNTDKEYIKKNLIGMjLFDGGNTAA 60 

M4K YSIGLDIGTNSVGW44ITD+YKVP+KK +VLGNTD4 IKKNLIGALLFD G TA 
Sbjct: 1 MDKKYSIGLDIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSIKKNLIGALLFDSGETAE 60 



Sbjct: 

Query: 121 TLQEEKDYHEKFSTIYHLRKEIADKKEKADLRIjIYIALAHIIKFRGHFIjIEDDSFDVRNT 180 

+ 4E YHEK+ TIYHLRK4L D +KADLRLIY4ALAH+IKFRGHFLIE D + N+ 
Sbjct: 121 NIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGD-IiNPDNS 179 

Query: 181 DISKQYQDFLEIFNTTFENNDLLSQNVDVEAILTDKrSKSAKKDRILAQYPNQKSTGIFA 240 

D+ K + 44 4N FE N + 4 VD 4AIL4 4+SKS + 4 44AQ P +K G4F 
Sbjct: 180 DVIlKLFIQLVQTYNQLFEENPIHASGVDAFAILSARliStCSRRLENLIAQLPGEKKNGLFG 239 

Query: 241 EFIjKLIVGIIQADFKKYFNLEDKTPLQFAKDSYDEDLENLLGQIGDEFADLFSAAKKLYDS 300 

4 L 4G 4FK F4L 4 LQ +KD4YD4DL4NLL QIGD44ADLF AAK L D4 
Sbjct: 240 NLIALSLGLTPNFKSNFDIJffiDAKIaQLSKDTYDDDLDNLIjAQIGDQYADLFLAAKNLSDA 299 

Query: 301 VLLSGILTVIDLSTKAPLSASMIQRYDEHREDLKQLKQFVKftSLPEKYQEIFADSSKDGY 360 

4LLS IL V TKAPLSASMI +RYDEH 4DI1 LK V4 LPEKY4EIF D SK4GY 
Sbjct: 300 ILLSDILRVNTEITKAPLSASMIKRYDEHHQDLTLLKALVRQQIiPEKYKEIFFDQSKNGY 359 

Query: 361 AGYIEGKTNQEAFYKYLSKLLTKQ3DSENFLEKIKNEDFLRKQRTFDNGSIPHQVHLTEL 420 

AGYI4G 4QE FYK44 4L K + +E L K4 ED LRKQRTFDNGS I PHQ4HI1 EL 
Sbjct: 360 AGYIDGGASQEEFYKFIKPILEKICIGTEEIiLVKLNREDLLRKQRTFDNGSIPHQIHLGEL 419 



65 Q ° ery ' 



421 



KA1IRRQSEYYPFLKENQDRIEKILTFRIPYYIGPLAREKSDFAWMTRKTDDSIRPWNFE 480 
AI4RRQ +4YPFLK4N444IEKILTFRIPYY+GPLAR S FAWMTRK4+++I PWNFE 
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Sbjct: 420 HAILRRQEDFYPFLKDNREKIEKILTFP.IPYYi/GPLARGNSEFAWMTRKSEETITPWWFE 479 

Query: 481 DLTOKEKSAFAFIHRMTNNDFYLPEEK^ 539 

++VDK SA++FI RMTN D LP EKVLPKHSL+YE FTVYNELTKV+Y E + F 
Sbjct: 480 EVVDKGASAQSFIERMTNFDKnSILPI^KA^PKHSLLyEYFTVYHELTKVKXVTEGMRKPAF 539 

Query: 540 FDSNIKQEIFDGVFKEHRKVSKKKLLDFIAKEYEEFRIVDVIGLDKENKAFNASLGTYHD 599 

K+ I D 4FK +RKV+ K+L + K+ E F V++ G++ FNASLGTYHD 
Sbjct: 540 LSGEQKKAIVDLLFKTNRKOTVKQLKEDYFKKIECFDSVEISGVEDR FNASLGTYHD 596 

Query: SOO LEKIL-DKDFLDNPDNESILEDIVQTLTLFEDREMIKKRLENYKDLFTESQLKKLYRRHY 658 

L KI+ DKDFLDN +NE ILEDIV TLTLFEDREMI++RL+ Y LF + +K+L RR Y 
Sbjct: 597 LLKIIKDKDFLDNEENEDILEDIVLTLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRRY 656 

Query: 659 TGWGRLSAKLINGIRDKESQKTILDYLIDDGRSNRNFMQLIWDDGLSFKSIISKAQAGSH 718 

TGWGRLS KLINGIRDK+S KTILD+L DG +NRNFMQLI+DD L+FK I KAQ 
Sbjct: 657 TGWGRLSRKLINGIRDKQSGKTILDFLKSDGFANRNFMQLIHDDSLTFKEDIQKAQVSGQ 716 

Query: 719 SDNLKEWGEIAGSPAIKKGILQSLKIVDELVKVMG-YEPEQIWEMAREMQTTNQGRRN 777 

D+L E + LAGSPAIKKGILQ++K+VDELVKVMG ++PE IV+EMARENQTT +G++N 
Sbjct: 717 GDSLHEHIANIAGSPAIKKGILQTVKVVDELVKWGRHKPENIVIEMARENQTTQKGQKN 776 

Query: 778 SRQRYKLLDDGVKNLASDLNGNILKEYPTDNQALQNERLFLYYLQNGRDMYTGEALDIDN 837 

SR+R K +++G+K L S ILKE+P +N LQNE+L+LYYLQNGRDMY + LDI+ 

Sbjct: 777 SRERMKRIEEGIKELGS QILKEHPVSNTQLQNEKLYLYYLQNGRDMYVDQELDINR 832 

Query: 838 LSQYDIDHIIPQAFIKDDSIDNRVLVSSAKNRGKSDDVPSLEIVKDCKVFWKKLLDAKLM 897 

LS YD+DHI+PQ+F+KDDSIDN+VL S KNRGKSD+VPS E+VK K +W++LL+AKL+ 
Sbjct: 833 LSDYDVDH I VPQS FLKDDS IDNKVLTRSDKM2GKSDNVPSEFAA7KKMKNYTOQLLNAKLI 892 

Query: 898 SQRKYDKLTKAERGGLTSDDKARFIQRQLVETRQITKHVARILDERFNNELDSKGRRIRK 957 

+QRK+DNLTKAERGGL+ DKA FI+RQLVETRQITKHVA+ILD R W + D + IR+ 
Sbjct: 893 TQRKFDNLTKAERGGLSELDKAGF1KRQLVETRQITKHVAQILDSRMNTKYDENDKLIRE 952 

Query: 958 VK^Vi'r.KSHLVSNFRKEFGFYKIREin^HHAEDAYMAVVAKAILTKYPQLEPEFVYGD 1017 

VK++TLKS LVS+FRK+F FYK+RE+NNYHHAEDAYLNAW A++ KYP+LE EFVYGD 
Sbjct: 953 VKVITLKSKLVSDFRKDFQFYI<VREINNYHHAHDAYIjNAWGTALIKKYPKLESEFVYGD 1012 

Query: 1018 YPKYN SYKTRKSATEKLFFYSNIKNFFKTKVTLADGTVVVKDDIEVNNDTGEI 1070 

Y Y+ S + AT K FFYSNIMNFFKT++TLA+G + + IE N +TGEI 

Sbjct: 1013 YKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEITLANGEIRICRPLIETNGETGEI 1072 

Query: 1071 VTOKKKHFATWKVLSYPQmiVKKTEIOTGGFSKESILAHGNSDKLIPRKTKDIYLDPK 1130 
VWDK + FATVRKVLS PQ NIVKKTE+QTGGFSKESIL NSDKLI RK KD DPK 
1073 VVC1KGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKES ILPKRNSDKLIARK- KD - -WDPK 1129 



Query: 
Sbjct 
Query: 
Sbjct: 
Query 
Sbjct 

Sbjct 



1131 KYGGFDSPIVAYSVLWADIKKGKAQKLKTOTELLGITIMERSRFEKNPSAFLESKGYLN 1190 

KYGGFDSP VAYSVLWA ++KGK++KLK+V ELLGITIMERS FEKNP FLE+KGY 
1130 KYGGFDSPTOAYSVLWAKVEKGKSKKLKSVKELLGITIKERSSFEKNPIDFLEAKGYKE 1189 

1191 IRADKLIILPKYSLFELENGRRRLLASAGELQKGNELALPTQFMKFLYLASRYNESKGKP 1250 

++ D +1 LPKYSLFELENGR+R+LASAGELQKGNELALP+++ + FLYLAS Y + KG P 
1190 VKKDLIIKLPKYSLFELENGRKRMI^SAGELQKGNEKALPSKYWFLYLASHYEKLKGSP 1249 

1251 EEIEKKQEFVNQHVSYFDDILQLINDFSKRVI1ADANLEKINKLYQDNKENISVDELANN 1310 

E+ E+KQ FV QH Y D+I++ I++FSKRVILADANL+K+ Y +++ + E A N 
1250 EDNEQKQLFVEQHKHYLDEI1EQISEFSKRVIIADANLDKVLSAYNKHRDK-PIREQAEN 1308 

1311 IINLFTFTSLGAPAAFKFFDKIVDRKRYTSTKEVLNSTLIHQSITGLYETRIDLGKLGED 1370 

II+LFT T+LGAPAAFK+FD +DRKRYTSTKEVL++TLIHQSITGLYETRIDL +LG D 
1309 IIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTKEVLDATLIHQSITGLYETRIDLSQLGGD 1368 



SEQ ID 4210 (GBS317) was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 27 (lane 2; MW 179.3kDa) and in Figure 159 (lane 5 & 6; MW 180kDa). It was 
also expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell extract is shown in Figure 
27 (lane 3; MW 154.3kDa) and in Figure 159 (lane 9 & 10; MW 154kDa). 
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GBS317-GST was purified as shown in Figure 224, lane 9-10. GBS317-His was purified as shown in 
Figure 222, lane 9. 

GBS317N was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell extract is 
shown in Figure 149 (lane 2-4; MW 1 16kDa). 

GBS317C was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell extract is 
shown in Figure 166 (lane 6-8; MW 921cDa). 

GBS317dN was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell extract is 
shown in Figure 187 (lane 7; MW 1 16kDa). Purified GBS317dN-GST is shown in Figure 245, lane 8. 

GBS317C was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell extract is 
shown in Figure 188 (lane 13; MW 92kDa). Purified GBS317dC-GST is shown in Figure 245, lane 9. 

Based on this analysis, it was predicted that these proteins and their epitopes could he useful antigens for 
vaccines or diagnostics. 

Example 1376 

A DNA sequence (GBSxl461) was identified in S.agalactiae <SEQ ID 4213> which encodes the amino 
acid sequence <SEQ ID 4214>. Analysis of this protein sequence reveals the following: 

Possible site: 25 

»> Seems to have a cleavable N-term signal seq. 

INTEGRAL Likelihood =-11.94 Transmembrane 132 - 148 ( 123 - 156) 
INTEGRAL , Likelihood =-11.09 Transmembrane 190 - 206 ( 183 - 209) 
INTEGRAL Likelihood = -4.94 Transmembrane 95 - 111 (94-115) 

Final Results 

bacterial membrane — Certainty=0. 5776 (Affirmative) < suco 

bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

A related sequence was also identified in GAS <SEQ ID 9133> which encodes the amino acid sequence 
<SEQ ID 9134>. Analysis of this protein sequence reveals the following: 

Possible site: 22 

»> Seems to have a cleavable N-term signal seq. 

INTEGRAL Likelihood = -7.32 Transmembrane 126 - 142 
INTEGRAL Likelihood = -6.90 Transmembrane 178 - 194 

Final Results 

bacterial membrane Certainty=0. 3930 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 94/204 (46%) , Positives = 139/204 (68%) 

Query: 5 LMKDKLLWLTWIWIISIjATIATIYIAWLIYPIEIQFLKLEKVVYLKAETiyYISIFNKLMI 64 

+M + ++ +W+W+++LA L TIY WL YP+E+ LKLE+W++ + I +N+N L+ 
Sbjct: 4 WWENTKLLCSWVWLLALAILITIYSTKLWYPLEVDHLKLEQWFMSKDAIIjHNYNGLIiN 63 

Query: 65 YLTHPFISDLNMPSFESSEDGLKHFADVKYLFTLAHGLFVILTFPVIYFLRRGWKQKSIF 124 

YLT+PF++ L +F SS DGLKHFADVK+LF L +F+ L +P + + K K + 
Sbjct: 64 YLTNPFTCRLEFANFHSSADGLKHFADVKWLFHLTQWFLGLLYPTLKTFTQRLKTKRFW 123 
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Query: 125 LYEGFFKIAIMLPIFIWCAFLLGFDQFFTLFHEVLFPGDSTWQFNPLTDPVIWILPETF 184 

L + +A + P+ I + A +GF+ FFTLFH+VLF GDS+W F+PL D VIWILPE F 
Sbjct: 124 LLQKPLILAALFPLMIGLMASFIGFEHFFTLFHQVLFVGDSSWLFDPLKDSVIWILPEVF 183 

Query: 185 FLHCFIIFLLIYETITI ILLIIGR 208 

FLHCF+ F++4YE I L+ + R 
Sbjct: 184 FLHCFLFFMIVYEIILWSLVGLAR 207 



10 SEQ ID 4214 (GBS167) was expressed in and purified from E.coli. The purified protein is shown in lanes 5 
& 6 of Figure 223. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1377 

15 A DNA sequence (GBSxl462) was identified in S.agalactiae <SEQ ID 4217> which encodes the amino 
acid sequence <SEQ ID 42 1 8>. This protein is predicted to be p-nitrophenyl phosphatase (pho2). Analysis 
of this protein sequence reveals the following: 

Possible site: 48 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3925 (Affirmative) < suco 

bacterial membrane — Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB15219 GB:Z99120 similar to N-acetyl -glucosamine catabolism 
[Bacillus subtilis] 
Identities = 121/249 (48%) , Positives = 172/249 (6B%) 

30 





Query: 


3 


YKGYLIDLDGTIYKGKSRIPAGERFIERLQEKGIPYMLVTNNTTRTPESVQEMLRGFNVE 


62 








YKGYLIDLDGT+Y G +1 F+ L+++G+PY+ VTNN++RTP+ V + L F++ 






Sbjct: 


4 


YKGYLIDLDGTMYNGTEKIEEACEFVRTLKDRGVPYLFVTNNSSRTPKQVADKLVSFDIP 


63 


35 


Query: 


63 


TPLETIYTATMATVUYMNDMTOGKTAWIGEEGLKKAIADAGYVBDTKNPAYVWGLDWN 


122 








E ++T +MAT ++ + + YVIGEEG+++AI + G +N +VWG+D + 






Sbjct: 


64 


ATEEQVFTTSMATAQHIAQQKKDASVYVIGEEGIRQAIEENGLTFGGENADFWVGIDRS 


123 




Query: 


123 


VTYDKIATATLAIQNGALFIGTWPDLNIPTERGLLPGAGSLNALLEAATRIKPVFIGKPN 


182 


40 






+TY+K A LAI+NGA FI TN D+ IPTERGLLPG GSL ++L +T ++PVFIGKP 






Sbjct: 


124 


ITYEKFAVGCLAIRNGARFISTNGDIAI PTERGLLPGNG3LTSVLTVSTGVQPVFIGKPE 


183 



Query: 183 AIINMKALEIlNIPRNQAVMVGDIJYLTDI^GINtTOIDTLLvTTGFTTVEEVPDIjPIQPS 242 
+IIM +A+ +L ++ +MVGDNY TDIMAGIN +DTLLV TG T E + D +P+ 
45 Sbjct: 184 SIIMEQAMRVLGTDVSETLMVGDNYATDIMAGINAGMDTLLVHTGVTKREHMTDDMEKPT 243 



Query: 243 YVLASLDEW 251 

+ + SL EW 
Sbjct: 244 HAIDSLTEW 252 



A related DNA sequence was identified in S.pyogenes <SEQ ID 4219> which encodes the amino acid 
sequence <SEQ ID 4220>. Analysis of this protein sequence reveals the following: 

Possible site: 48 
>» Seems to have no N-terminal signal sequence 
55 INTEGRAL Likelihood = -0.53 Transmembrane 128 - 144 ( 128 - 144) 

Final Results 

bacterial membrane Certainty=0. 1213 (Affirmative) <: suco 
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bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 
bacterial cytoplasm Certainty= 0.0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:CAB15219 GB: Z99120 similar to N-acetyl -glucosamine catabolism 
[Bacillus subtilis] 
Identities = 121/250 (48%) , Positives = 166/250 (66%) , Gaps = 1/250 (0%) 

Query: 3 YKGYLIDLDGTIYQGKNRIPAGERFIKRLQERGIPYLLVTMNTTRTPEMVQSMLANQFHV 62 

YKGYL I DLDGT+ Y G +1 F++ L++RG+PYL VTNN++RTP+ V L + F + 

Sbjct: 4 YKGYLIDLDGTMYNGTEKIEEACEFWTLKDRGVPYLFVTNNSSRTPKQVADEOjVS-FDI 62 

Query: 63 ETSIETIYTATMATVDYMNDMNRGKTAYVIGETGLKSAIAAAGYVEELENPAYVWGLDS 122 

+ E ++T +MAT ++ + + YVIGE G++ AI G EN +WVG+D 

Sbj ct : 63 PATEEQVFTTSMATAQHIAQQKKDASVWIGEEGIRQAIEENGLTFGGENADFVWGIDR 122 

Query: 123 QVTYEMLAIATLAIQKGALFIGTNPDLNI PTERGLMPGAC!ALNALLEAATRVKPVFIGKP 182 

+TYE A+ LA1+ GA FI TN D+ IPTERGL+PG G+L ++L +T V+PVFIGKP 
Sbjct: 123 SITYEKFAVGCLAIRNGARFISTNGDIAIPTERGLLPGNGSLTSVLTVSTGVQPVFIGKP 182 

Query: 183 NAIIMNKSLEVLGIQRSEAVMVGDNYLTDIMAGIQNDIATILVTTGFTRPEEVPTLPIQP 242 

+IIM +++ VLG SE +MVGDNY TDIMAGI + T+LV TG T+ E + +P 
Sbjct: 183 ESI IMEQAMRVLGTDVSETLMVGDNYATDIMAGINAGMDTLLVHTGVTKREHMTDDMEKP 242 

Query: 243 DHVLSSLDEW 252 

H + SL EW 
Sbjct: 243 THAIDSLTEW 252 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 207/250 (82%), Positives = 227/250 (90%), Gaps = 1/250 (0%) 

Query: 3 YKGYLIDLDGTIYKGKSRIPAGERFIERLQEKGIPYMLVTNNTTRTPESVQEMLRG-FNV 61 

YKGYLIDLa^IY+GK+RIPAGERFI+RLQE+GIPY+LVTNNTTRTPE VQ ML F+V 
Sbjct: 3 YKGYLIDLDGTIYQGKNRIPAGERFIKRLQERGIPYLLVTNNTTRTPEMVQSMLANQFHV 62 

Query: 62 ETPLETIYTATMATVDYMNDMNRGKTAYVIGEEGLKKAIADAGYVEDTKNPAYVWGLDW 121 

ET +ETIYTATMATVDYMNDMNRGKTAYVIGE GLK AIA AGYVE+ +NPAYVWGLD 
Sbjct: 63 ETSIETIYTATMATVDYMNDMNRGKTAYVIGETGLKSAIAAAGYVEELENPAYVWGLDS 122 

Query: 122 NVTYDKIATAT^IQNGALFIGTNPDLNIPTERGLLPGAGSLNALLEAATRIKPVFIGKP 181 

VTY+ LA ATLAIQ GALFIGTNPDLNIPTERGL+PGAG+LNALLEAATR+KPVFIGKP 
Sbjct: 123 QWYEMLAIATLAIQKGALFIGTNPDLNIPTERGLMPGAGALNALLEAATRVKPVFIGKP 182 

Query: 182 NAI II^KALEILNIPRNQAVMVGDNYLTDIMAGINNDIDTLLVTTGFTTVEEVPDLPIQP 241 

NAI IMNK+LE+L I R++AVMVGDNYLTDIMAGI NDI T+LVTTGFT EEVP LPIQP 
Sbjct: 183 NAI IMNKSLEVLGI QRSEAVMVGDIfbTLTD I MAG IQMD IATI LVTTGFTRPEEVPTLP IQP 242 

Query: 242 SYVLASLDEW 251 

+VL+SLDEW 
Sbjct: 243 DHVLSSLDEW 252 

A similar DNA sequence was identified in S. pyogenes <SEQ ID 4215> which encodes amino acid sequence 
<SEQ ID 4216>. An alignment of the GAS and GBS sequences follows: 

Identities = 94/204 (46%) , Positives = 139/204 (68%) 

Query: 4 VMVENTKLLCSWVmjJ^ILITIYSTWI^ 63 

+M + ++ +W+W+++LA L TIY WL YP+E+ LKLE+W++ + I +N+N L+ 
Sbjct: 5 LMKDKLLWLTWIWIISLATLATIYIAWLIYPIEIQFLKLEKVVYLKAETIYYNFNKLMI 64 

Query: 64 YLTNPFVTRLEFANFHSSADGLKHFADVKWLFHLTQVVFLGLLYPTLKTFTQRLKTKRFW 123 

YLT+PF++ L +F SS DGLKHFADVK+LF L +F+ L +P + + K K + 
Sbjct: 65 YLTHPFI SDLNMPS FPSSEDGLKHFADVKYLFTLAHGLFVI LTFPVI YFLRRGWKQKS I F 124 

Query: 124 LLQKPLILAALFPLMIGLMASFIGFEHFFTLFHQVLFVGDSSWLFDPLKDSVIWILPEVF 183 
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L + +A + P+ I + A +GF+ FFTLFH+VLF GDS+W F+PL D VIWILPE F 
Sbjct: 125 LYEGFFKIAIMLPIFIWCAFLLGFDQFFTLFHEVLFPGDSTWQFNPLTDPVIWILPETF 184 

Query: 184 FLHCFLFFMIVYEIILWSLVGLAR 207 

FLHCF+ F+++YE I L+ + R 
Sbjct: 185 FLHCF 1 1 FLLI YETITI I IGR 208 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1378 

A DNA sequence (GBSxl463) was identified in S.agalactiae <SEQ ID 4221> which encodes the amino 
acid sequence <SEQ ID 4222>. This protein is predicted to he oleoyl-acyl carrier protein thioesterase. 
Analysis of this protein sequence reveals the following: 

N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .3332 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0D00 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAB02069 GB:AB026647 acyl carrier protein thioesterase 
[Arabidopsis thaliana] 
Identities = 59/248 (23%), Positives = 104/248 (41%), Gaps = 30/248 (12%) 





2 


Sbjct: 


81 




57 


Sbjct: 


141 


Query: 


116 


Sbjct: 


201 




161 


Sbjct: 


258 


Query: 




Sbjct: 


312 



- - 1 VFKRYGLV 56 



TR++ V DD+ Y ++ +KK+ PK + R 

QDTRRLQKVSDDVRDEYLVFCPQEPRLAFPEENNRSLKKI PKLEDPAQYSMIGLKPR 257 



DLDMN HVNN Y+ W+ -t 



TRHDIIGG 228 



A related DNA sequence was identified in S.pyogenes <SEQ ID 4223> which encodes the amino acid 
sequence <SEQ ID 4224>. Analysis of this protein sequence reveals the following: 

Possible site: 54 
>>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -3.88 Transmembrane 21 - 37 ( 21 - 38) 

Final Results 

bacterial membrane Certainty=0. 2550 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Cer~ainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 
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>GP:AAB71730 GB:U65643 acyl-ACP thioesterase [Myristica fragrans] 
Identities = 41/128 (32%) , Positives = 67/128 (52%) , Gaps = 11/128 (8%) 

Query: 33 FIFMIKRGGLLVDILAYFALLNPDTRKVATIPEDLVAPFETDFVKKLHRV PKMPL 87 

F+ K G +L + + ++N TR+++ IPE++ E FV+ H V K+P 
Sbjct: 147 FLRDCKTGEILTRATSVWVMMNKRTRRLSKIPEEVRVEIEPYFVE- -HGVLDEDSRKLPK 204 

Query: 88 LEQS IDRDYYVRYFDIDMNGHVNNSKYLDWMYDVLGCEFLKTHQPLKMTLKYVKEV 143 

L + I R R+ D+D+N HVNN KY+ W+ + + L++H+ MTL+Y KE 
Sbjct: 205 l^OTANYIRRGIAPRWSDI^WQHWWKYIGWILESVPSSLLESHELYGMTLEYRKEC 264 

Query: 144 SPGGQITS 151 
G + S 

Sbjct: 265 GKDGLLQS 272 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 62/144 (43%) , Positives = 94/144 (65%) 

Query: 101 GQKIITISSAFVLMDFKTRKIHPVLDDITSIYQSQRIKKVIRGPKYHP1GDSKVKQYHVR 160 

G ++ I + F L++ TRK+ 4 +D+ + +++ +KK+ R PK + S + Y+VR 
Sbjct: 40 GGLLVDILAYFALLNPDTRKVATIPEDLVAPFETDFVKKLHRVPKMPLLEQSIDRDYYVR 99 

Query: 161 YFDLDMNGHVNNSKYLEWMYDVLDLDFLSSHIPKKIDLKYIKEIQYGTDIKSHWYQDGLV 220 

YFD+DMNGHVNNSKYL+WMYDVL +FL +H P K+ LKY+KE+ G I S ++ D L 
Sbjct: 100 YFDIDMNGHvNNSKYLDWMYDVLGCEFLKTHQPIiKMTLKYVKEVSPGGQITSSYHljDQLT 159 

Query: 221 TRHDIIGGDAIHAQARIEWQEKKE 244 

tfll ++AQA IEW+ K+ 

Sbjct: 160 SYHQITSDGQLNAQAMIEWRAIKQ 183 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1379 

A DNA sequence (GBSxl464) was identified in S.agalactiae <SEQ ID 4225> which encodes the amino 
acid sequence <SEQ ID 4226>. This protein is predicted to be coproporphyrinogen III oxidase. Analysis of 
this protein sequence reveals the following: 

I-terminal signal sequence 



40 Final Results 

bacterial cytoplasm Certainty=0 . 1484 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

45 The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAB05062 GB:AP001511 coproporphyrinogen III oxidase [Bacillus halodurans] 
Identities = 173/375 (46%), Positives = 248/375 (66%), Gaps = 5/375 (1%) 

Query: 5 PTSAYVHIPFCTQICYYCDFSKVFIKNQPVDAYLOALIREFR SYDITELRTLYIGG 60 

50 P +AY+HIPFC ICYYCDF+K ++KNQPV+ YLQAL E L+TLY+GG 

Sbjct: 2 PKAAYIHIPFCEHICYYCDFNKFYLKNQPWIJEYLQ^ETEMAMWAEQPTKSLQTLYVGG 61 

Query: 61 GTPTSISAVQLDYLLTELSRDLNLNTLEEFTIEftNPGDLTVDKIEVLQKSAVNRVSLGVQ 120 
GTPT+++A QL LL + R Ii L+ LEEFT E HP + +K++VL+ V+R+S+GVQ 
55 Sbjct: 62 GTPTALTADQLAQLLASIKRTLPLSDLEEFTFEVNPDSIDEEKLDVLRSYGVDRLSIGVQ 121 

Query: 121 TFNDKHLKRIGRSHNEAQIYSTIDALECTAGFCJNISIDLIYALPGQTMDDVRSNVAKALSL 180 

F LK IGR+H++ + ++ + AGF N+S+DL+ LP QT + + +A +L 

Sbjct: 122 AFQPLLLICEIGRTHDQKSVEQAVEKSRQAGFANLSLDLMLGLPKQTPEMFAETLKEAFAL 181 
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Query: 


181 




Sbjct: 


182 


5 


Query: 


241 




Sbj Ct : 


242 






300 


10 








Sbjct: 


302 




Query: 


360 


15 


Sbjot: 


3G2 



HLS YSL +E TVF N+ R+G+L LP ED E +M+ + E E++GF+ YEISNF 



K G+ESRHNL+YW+N EYYG GAGA GY+ G+RY N GP+ 



3 MEE++FLGLRK+ GV 



+GL LG+ V E+F+ 
DEGLLLGNEVFEQFL 376 

A related DNA sequence was identified , in S. pyogenes <SEQ ID 4227> which encodes the amino acid 
sequence <SEQ ID 4228>. Analysis of this protein sequence reveals the following: 

} N-terminal signal sequence 

Final Results 

bacterial cytoplasm --- Certainty=0. 3202 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 304/376 (80%), Positives = 343/376 (90%) 

■ Query: 1 MLKKPTSAYVHIPFCTQICYYCDFSKVFIKNQPVDAYLQALIREFRSYDITELRTLYIGG 60 
M KKPTSAYVHIPFCTQICYYCDFSKVFI+NQPVDAYL+ALI+EF SY I +L+TLYIGG 
Sbjct: 33 MSKKPTSAYVHIPFCTQICYYCDFSKVFIQNQPVDAYLKALIQEFDSYGIRDLKTLYIGG 92 

Query: 61 GTPTSISAVQLDYLLTELSRDLNLtn'LEEFTIEANPGDLTVDKIEVLQKSAVNRVSLGVQ 120 

GTPT-fl+A QL+YLL L R+LNL+ LEEFTIEANPGDLT +KI VLQ+SAVNR+SLGVQ 
Sbjct: 93 GTPTAITAKQLEYLIMJLERNLHLDDLEEFTIEANPGDLTPEKIAVLQRSAVNRISLGVQ 152 

Query: 121 TFTTOKHLKRIGRSHNEAQIYSTIDALICrAGFQNISIDLIYALPGQTMDDVRSNVAKALSL 180 

TFN+K LK+IGRSHNE QIYSTI LKTAGF NISIDLIYALPGQT+D V+ NVAKAL+L 
Sbjct: 153 TFNNKQLKQIGRSHNEEQIYSTIANLKTAGFHNISIDLIYALPGQTLDQV1CENVAKALAL 212 

Query: 181 NIPHLSLYSLILEHHTVFMNKMRRGKLHLPTEDLEAEMFEYIISEMERNGFEHYEISNFT 240 

+ 1 PHLSLYSL I LEHHTVFMNKMRRGKL+LPTEDLEAEMFEYI 1 SEME NGFEHYEISNFT 
Sbjct: 213 DIPHLSLYSLILEHHTVFMNKJ1RRGKLNLPTEDLEAEMPEYIISEMEANGFEHYEISNFT 272 

Query: 241 KPGFESRHNLMYWDNVEYYGVGAGASGYLDGIRYRNRGPIQHYLKGVSEGNARLSEEVLS 300 

KPGFESRHNLMYWDNVEY+G GAGASGYL+GIRY+HR PIQHYLK V GNARL+EEVL 
Sbjct: 273 KPGFESRHKLMYWDNVE YFGCGAGASG YLNGI RYQNRVP I QHYLKAVEAGNARLNEEVLR 332 

Query: 301 KNEMMEEELFLGLRKKEGVSIGKFEQKFGTSFEPCRYGQIVQELQSDGLLKENNGFIQMTK 360 

K EMMEEELFLGbRKK GVSI +F++KFG SFE+RYG IV+ELQ+ GLL +++ F++MTK 
Sbjct: 333 KEEMEEELFLGLRKKTGVSIQRFQEKFGMSFEERYGNIVRELQNQGLLVKDDAFVRMTK 3 92 

Query: 361 KGLFLGDTVAEKFIVE 376 

KGLFLGD+VAE+FI++ 
Sbjct: 393 KGLFLGDSVAERFILD 408 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 
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Example 1380 

A DNA sequence (GBSxl465) was identified in S.agalactiae <SEQ ID 4229> which encodes the amino 
acid sequence <SEQ ID 4230>. Analysis of this protein sequence reveals the following: 

Possible site: 52 

>>> Seems to have no N-terrainal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .3729 (Affirmative) < suco 

bacterial membrane --- Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1381 

A DNA sequence (GBSxl466) was identified in S.agalactiae <SEQ ID 4231> which encodes the amino 
acid sequence <SEQ ID 423 2>. Analysis of this protein sequence reveals the following: 

N-terminal signal sequence 

Final Results 

bacterial cytoplasm — Certainty=0 .2989 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 .0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

A related DNA sequence was identified in S.pyogenes <SEQ ID 4233> which encodes the amino acid 
sequence <SEQ ID 4234>. Analysis of this protein sequence reveals the following: 

Possible site: 57 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .2993 (Affirmative) < suco 

bacterial membrane — certainty=0 . 0000 (Not Clear) < suco 
bacterial outside Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 36/109 (33%) , Positives = 58/109 (53%) , Gaps = 6/109 (5%) 

Query: 9 WAKHKYljVLSKSQKIYLDIRQTLKSPNCT VLDVQSLIDQAVLLEESPSQVTNAYMHI 65 

WA KY V++ SQ+ Y +R+ K + VL LI++A + + + AY H+ 
Sbjct: 13 WAYQKYWmAHSQQHYNALRKLFKGNQWSEEKVLTFHCLXEEAQAIPPTVKSLRTAYQHV 72 

Query: 66 WGYFKMKAERQEKEEFLTLLEKYRKTGYQRRKLLAFIiKQLLAKYPNSYL 114 

WGYFK A ++EK+ F L + + ++L FL+++ A Y SYL 

Sbjct: 73 WGYFKKVASQEEKDHFKDLDAQLET KSEEMLCFLQEMTAHYQPSYL 118 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 
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Example 1382 

A DNA sequence (GBSxl467) was identified in S.agalactiae <SEQ ID 4235> which encodes the amino 
acid sequence <SEQ ID 423 6>. This protein is predicted to be mrsA (mrsA). Analysis of this protein 
sequence reveals the following: 

Possible site: 35 

>>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -0.96 Transmembrane 56 - 72 ( 56 - 72) 



The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB11970 GB:Z99105 similar to phosphogluccmutase (glycolysis) 
[Bacillus subtilis] 
Identities = 284/451 (62%) , Positives = 353/451 (77%) , Gaps = 4/451 (0%) 

Query: 1 MGKYFGTDGTOGEANWLTPEIAFKLGRFGGYVLSQHETDRPRVFVARDTRISGEMLESA 60 

MGKYFGTDGVRG AN ELTPELAFK+GRFGGYVL++ + RP+V + RDTRISG MLE A 
Sbjct: 1 MGKYFGTDGWGVANSELTPELAFKVGRFGGYVLTK-DKQRPKVLIGRDTRISGHMLEGA 59 

Query: 61 LIAGLLSVGIEVYKLGVLATPGVSYLVRTEKASAGVMISASHNPALDNGIKFFGSDGFKL 120 

L+AGLLS+G EV +LGV++TPGVSYL + A AGVMI SASHNP DNGIKFFG DGFKL 
Sbjct: 60 LVAGLLS I GAEVMRLG VI STPGVS YLTKAMDAEAG VM I SASHNPVQDNGI KFFGGDGFKL 119 

Query: 121 DDDRELEIEALLDAKEDTLPRPSAQGLGTLVDYPEGLRKYEKFMESTGI - DLEGMKVALD 179 

D++E EIE L+D ED LPRP LG + DY EG +KY +F++ T D G+ VALD 
Sbjct: 120 SDEQEAEIERLMDEPEDKLPRPVGADLGLVNDYFEGGQKYLQFLKQTADEDFTGIHVALD 179 

Query: 180 TANGAATASARNIFLDLNADISVIGDQPDGLNINDGVGSTHPEQLQELVRENGSDIGLAF 239 

ANGA ++ A ++F DL+AD+S +G P+GLNINDGVGSTEPE L + V+E +D+GLAF 
Sbjct: 180 CANGATSSIATHLFADLDADVSTMGTSPNGLNIITOVGSTHPEALSAFVKEKNADLGLAF 239 

Query: 240 DGDSDRLIAVDENGEIVDGDKIMFIIGKYLSDKGQLAQNTIVTTVMSNLGFHKALDREGI 299 

DGD DRLIAVDE G IVDGD+IM+I K+L +G+L +T+V+TVMSNLGF+KAL++EGI 
Sbjct: 240 DGDGDRLIAVDEKGNIVDGDQIMYICSKHLKSEGRLKDDTWSTVMSNLGFYKALEKEGI 299 

Query: 3 00 HKAITAVGDRYWEEMRKSGYNLGGEQSGHVIIMDYNTTGDGQLTAIQLTKVMKETGKKL 359 

TAVGDRYWE M+K GYN+GGEQSGH+I +DYNTTGDG L+AI L +K TGK L 
Sbjct: 3 00 KSVQTAVGDRYVVEAMKI03GYNVGGEQSGHLIFLDYNTTGDGLLSAIMLMNTLKATGKPL 359 

Query: 360 SELASEVTIYPQKLVNIRVENNMKDKAMEVPAIAEIIAKMEEEMDGNGRILVRPSGTEPL 419 

SELA+E+ +PQ LVN+RV + K K E + +I+++E+EM+G+GRILVRPSGTEPL 
Sbjct: 360 SELAAEMQKFPQLLVNVRVTD- -KYKVEENEKVKAVISEVEKEMNGDGRILVRPSGTEPL 417 

Query: 420 LRvMAEAPTNEAVDYYVDTIADWRTEIGLD 450 

+RVMAEA TED YV+ I +WR+E+GL+ 
Sbjct: 418 VRVMAEAKTKELCDEYVNRIVEWRS3MGLE 44 8 

A related DNA sequence was identified in S. pyogenes <SEQ ID 423 7> which encodes the amino acid 
sequence <SEQ ID 423 8>. Analysis of this protein sequence reveals the following: 

Possible site: 35 
»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -0.96 Transmembrane 56 - 72 ( 56 - 72) 



Final Results 



bacterial membrane 
bacterial outside 
bacterial cytoplasm 



•- Certainty=0. 1383 (Affirmative) 
■- Certainty=0. 0000 (Not Clear) < 
•- Certainty=0 . 0000 (Not Clear) < 




Final Results 



bacterial membrane Certainty=0. 1383 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 
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The protein has homology with the following sequences in the databases: 

>GP:CAB11970 GB:Z99105 similar to phosphoglucomutase (glycolysis) 
[Bacillus subtilis] 
Identities = 287/451 (63%) , Positives = 346/451 (76%) , Gaps = 4/451 (0%) 

Query: 1 MGK^FGTDGWGEAHVELTPEIAFKIjGRFGGYVXiSQHETERPKVFVARDTRISGEMLESA 60 

MGKYFGTDGVRG AN ELTPELAFK+GRFGGYVL++ + +RPKV + RDTRISG MLE A 
Sbjct: 1 MGKYFGTDGVRGVANSELTPELAFIOTGRFGGYVLTK-DKQRPKVLIGRDTRISGHMLEGA 59 

Query: 61 LIAGLLSVGIEVYKLGVLATPGVSYLWTEKASAGVMISASHNPALDNGIKFFGNDGFKL 120 

L+AGLLS+G EV +LGV++TPGVSYL + A AGVMISASHNP DNGIKFFG DGFKL 
Sbjct: 60 LVAGLLSIGAEVMRLGVISTPGVSYLTKA1-IDAEAGVK1SASHNPVQDNGIKFFGGDGFKL 119 

Query: 121 ADDQELEIEALLDAPEDTLPRPSAEGLGTI.VDYPEGLRKYEKFLVTTGT-DLSGMTVALD 179 

+D+QE EIE L+D PED LPRP LG + DY EG +KY +FL T D +G+ VALD 
Sbjct: 120 SDEQEAEIERMDEPEDKLPRPVGADLGLVNDYFEGGQKYLQFDKQTADEDFTGIHVALD 179 



Query: 240 DGDSDRLIAvnETGEIVDGDRlMFIIGKYLSSKGLLAHNTIVTTVMSNLGFHKALDKQGI 299 

DGD DRLIAVDE G IVDGD+IM+I K+L -t-G L +T+V+TVMSNLGF+KAL+K+GI 
Sbjct: 240 DGDGDRLIAVDEKGNIVDGDQIMYICSKHLKSEGRLKDDTWSTVMSNLGFYKALEKEGI 299 

Query: 300 NKAITAVGDRYVVEEMRSSGYNLGGEQSGHVIIMDYNTTGDGQLTAIQLAICVMKETGKSL 359 

TAVGDRYWE M+ GYN+GGEQSGH+I +DYNTTGDG L+AI L +K TGK L 
Sbjct: 300 KSVQTAVGDRYWEAMKKDGYNVGGEQSGHLI FLDYNTTGDGLLSAI MLMNTLKATGKPL 359 

Query: 360 SELAAEVTIYPQKLVNIRVENSMKERMIEVPAIANIIAKMEDEMAGNGRILVRPSGTEPL 419 

SELAAE+ +PQ LVN+RV + K + E + +I+++E EM G+GR I LVRPSGTEPL 
Sbjct: 360 SEUWffiMQKFPQLLVNWOTD--KYKVEENEKVKAVISEvEKEMNGDGRILVRPSGTEPL 417 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 400/450 (88%), Positives = 429/450 (94%) 

Query: 1 MGKyFGTDGTOGEANVELTPELAFKLGRFGGYVLSQHETDRPRVFVARDTRISGEMLESA 60 

MGKYFGTDGTOGEANVELTPELAFKLGRFGGYVLSQHET+RP+VFVARDTRISGEMLESA 
Sbjct: 1 MGKYFGTDGWGFJWVELTPELAFKLGRFGGYVLSQHETERPKVFVARDTRISGEMLESA 60 

Query: 61 LIAGLLSVGIEVYKLGVLATPGVSYLVRTEKASAGVMISASHNPALDNGIKFFGSDGFKL 120 

LIAGLLSVGIEVYKLGVLATPGVSYLVRTEKASAGVMISASHNPALDNGIKFFG+DGFKL 
Sbjct: 61 LIAGLLSVGIEVYKLGVLATPGVSYLTOTEKASAGVMISASHNPALDNGIKFFGNDGFKL 120 

Query: 121 DDDRELEIEALLDAKEDTLPRPSAQGLGTLVDYPEGLRKYEKFMESTGIDLEGMICVALDT 180 

DD+ELEIEALLDA EDTLPRPSA+GLGTI1VDYPEGI1RKYEKF+ +TG DL GM VALDT 
Sbjct: 121 ADDQELEIEALLDAPEDTLPRPSAEGLGTLVDYPEGLRKYEKFLvTTGTDLSGMTVALDT 180 

Query: 181 ANGAATASARNIFLDLNADISVIGDQPDGLNINDGVGSTHPEQLQSLVRENGSDIGLAFD 240 

ANGAA+ SAR++FLDIjNA+I+VIG^+P+GLNINDGVGST PEQLQ LV+E G+D+GLAFD 
Sbjct: 181 AWGAASVSARDVFLDLNAEIAVIGEKPNGLNINDGVGSTRPEQLQELVKETGADLGLAFD 240 

Query: 241 GDSDRLIAVDENGEIVDGDKIMFIIGKYLSDKGQLAQOTIVTTVMSNLGFHKALDREGIH 300 

GDSDRLIAVDE GEI VDGD+ IMFI IGKYLS+KG LA NTIVTTVMSNLGFHKALD++GI+ 
Sbjct: 241 GDSDRLIAVDETGEIVDGDRIMFIIGKYLSEKGLL7AHIWIVTTVMSNLGFHKALDKQGIN 300 

Query: 301 KAITAVGDRYVVEEMRKSGYNLGGEQSGHVTIMDYNTTGI^LTAIQLTKMKETGKKLS 360 

KAITAVGDRYWEEMR SGYNLGGEQSGHVIIMDYNTTGDGQLTAIQI, KVMKETGK LS 
Sbjct: 301 KAITAVGDRYWEEMRSSGYNLGGEQSGHVIIKDYNTTGDGQLTAIQLAKVMKETGKSLS 360 



65 



361 ElASEWIYPQKLTOIRvENOTMODKAMEVPAIAEIIjyCMEEEMDGNGRILVRPSGTEPLL 420 
ELA+EVTIYPQKLVNIRVEN+MK++AMEVPAIA IIAKME+EM GNGRILVRPSGTEPLL 
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Sbjct: 361 ELAAEVTIYPQKLVMIRVENSMKERAMEVPAIANIIAKMEDEMAGNGRILWPSGTEPLL 420 

Query: 421 RVMAEAPTNEAVDYYVDTIADWRTEIGLD 450 
RVMAEAPT+ VDYYVDTIADWRTEIG D 
5 Sbjct: 421 RVMM&PTDAEVDYYVDTIADWRTEIGCD 450 

SEQ ID 4236 (GBS402) was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 84 (lane 5; MW 78kDa). 

GBS402-GST was purified as shown in Figure 218, lane 3-5. 

10 Based on this analysis, it was predicted that these proteins and their epitopes could he useful antigens for 
vaccines or diagnostics. 

Example 1383 

A DNA sequence (GBSxl468) was identified in S.agalactiae <SEQ ID 4239> which encodes the amino 
acid sequence <SEQ ID 4240>. Analysis of this protein sequence reveals the following: 

15 Possible site: 28 

>>> Seems to have a cleavable N-term signal seq. 

Final Results 

bacterial outside Certainty=0 .3000 (Affirmative) < suco 

20 bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB11969 GB:Z99105 ybbR [Bacillus subtilis] 
25 Identities = 90/324 (27%), Positives = 167/324 (50%),, Gaps = 18/324 (5%) 





1 


MKKFFTNKFWLGWSLFIAILLFLTATAT3MNHQDNSKIAG ASETYTHTLTDVPI 


55 






M KF N++ + +++L A+LL++ A + N KG ST TLTD+P+ 




Sbjct: 


1 


MDKFTjNNRWAVKIIALLFALLLYV- - -AVNSNQAPTPKKPGESFFPTSTTDEATLTDIPV 


57 




56 


DIKyDSDDYFISGYSYGADVYMS-SVlTOVKLDSEINEDTRKFKVVADLTNMKPGTHKVPL 


114 






YD ++Y ++G +V + S + VK + T+ F++ AD+ ++K GTHKV L 




Sbjct: 


58 


KAYYDDENYWTGVPQTVNVT IKGSTSAVKKARQ TKNFE I YADMEHLKTGTHKVEL 






115 


KVVNLPSGVNATVSPTTITVTMGKKKT^ 


173 






K N+ G+ ++4P+ TVT+ ■(-+ TK FPV 4- K ++K GY+ ++ V V++ 




Sbjct: 


114 


JCAKNVSDGLTISINPSVTTVTIQERTTKSFPVEVEYYNKSKMKKGYSPEQPIVSPKNVQI 


173 




174 


TSDESI IDRIDHVAANI PDDKVLDDDFNKTVTLQAVTADGTVLAS I IHPSKATLSVKVKK 


233 






T +++ID I A++ + D+ K + DG L + PS ++V V 




Sbjct: 




TGSKMVIDNISLHKaSVmjENA-DETIEKEAKVTWDKDGNALPVDVEPSVIiaWPVTS 


232 




234 


LTKTVPII^IPVGQFSDSISKIOTKLSQEKAVISGTKEALEAISVIN-AEVDISDVTKNT 


292 






+K VP + G D +S N + S + + G+++ L+++ 1+ +D+S + K++ 




Sbjct: 


233 


PSKKVPFKIERTGSLPDGVSIANIESSPSEVTVYGSQDVLDSLEFIDGVSLDliSKINKDS 


292 




293 


--EKKINLSANNVSVDPAQVTVQL 314 








E I L + P++VT+ + 




Sbjct: 


293 


DIEADIPLPDGVKKISPSKVTLHI 316 





50 

A related DNA sequence was identified in S.pyo genes <SEQ ID 424 1> which encodes the amino acid 
sequence <SEQ ID 4242>. Analysis of this protein sequence reveals the following: 

Possible site: 29 
55 »> Seems to have a cleavable N-term signal seq. 



Final Results 
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bacterial outside Certainty=0 . 3000 (Affirmative) ■ 

bacterial membrane Certainty=0 . 0000 (Not Clear) < i 

bacterial cytoplasm Certainty=0 . 0000 (Hot Clear) < : 

5 The protein has homology with the following sequences in the databases: 



Query: 


1 


Sbjct: 


1 


Query: 


57 


Sbjct: 


60 




116 


Sbjct: 


116 




175 


Sbjct: 


176 


Query: 


235 


Sbjct: 


235 


Query: 


292 


Sbjct: 


295 



MKRFLNSRPWLGMVSVFFAILLFLTAASSNH NNSSSQIYSPIETYTHSLKDVP1DM 56 

M +FLN+R + ++++ FA+LL++ A +SN + T +L D4-P+ 

MDKFLNNRWAVKI IALLFALLLW-AVNSNQAPTPKKPGESFFPTSTTDFATLTDIPVKA 59 

KYDSDKYFISGYSYGAEVYLT-STNRIKLDSEVNNDTENFKIVADLTHSHPGTVSWLRV 115 

YD + Y ++G V + ST+ +K + T+NF+I AD+ H GT V L+ 
YYDDENYWTGVPQTVNVTI KGSTSAVKKARQ TKMFE I YADMEHLKTGTHKVELKA 115 



+N+ G+T +++P +VTI ++ +K FPV + ++ GY 



DG L ++P+ ++V V 



<■ GS++VL+ ++ I +++S + K++ 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 198/319 (62%), Positives = 251/319 (78%), Gaps = 1/319 (0%) 

Query: 1 MKKFFTNKFWLGWSLFLAILIiFLTATATSMNHQDNSKIAGASETYTHTLTDVPIDIKYD 60 

MK+F ++ WLG+VS+F AILLFLTA A+S ++ +S+I ETYTH+L DVPID+KYD 

Sbjct: 1 MKRFLNSRPWLGMVSVFFAILLFLTA-ASSNHKNSSSQIYSPIETYTHSLKDVPIDMKYD 59 

Query: 61 SDDYFISGYSYGADVYMSSVITOVKLDSEINEDTRKFKWADLTNMKPGTHKVPLKVVNLP 120 

SD YFISGYSYGA+VY++S NR+KLDSE+N DTR FK+VADLT+ PGT V L+V NLP 
Sbjct: 60 SDKYFISGYSYGAEWLTSTmiKLDSEVNNDTRNFKIVADLTHSHPGTVSVNLRVENLP 119 

Query: 121 SGWATVSPTTITVTMGKKKTKEFPWGHVNDKQIKAGYAVDKMSVDVSKVKVTSDESII 180 

SGV ATVSP I+VT+GKK++K FPV G V+ KQI GY + K+ V+KV+VTSDES I 
Sbjct: 120 SGVTATVSPDKISVTIGKKESKVFPVRGSVDAKQIANGYEISKIETGVNKVEVTSDESTI 179 

Query: 181 DRIDHVAANIPDDKVLDDDFWKTVTLQAVTADGTVLASIIHPSKATLSVKViaCLTKTVPI 240 

IDHV A +PDD+VLD +++ VTLQAV+ADGT+LAS I P+K LSV VKK+TK+VPI 
Sbjct: 180 ALIDHVYAJO.PDDQVLDRNYSSRVTLQAVSADGTILASAIDPAKTNLSVAVKKITKSVPI 239 

Query: 241 NLIPVGQFSDSISKINYKLSQEKAVISGTKFALEAISVINAEVDISDVTKNTEKKINLSA 300 

+ VG DS+S I YKLS++ AVISG++E LE I I AEV+ 1 SDVTKNT K ++LS+ 
Sbjct: 240 RvEAVGIVMDDSLSDIQYKIjSKQTAVISGSREVLEDIDEIIAEVNISDVTKNTSKTV'SLSS 299 



Query: 301 NNVSVDPAQVTVQLTTTKK 319 

+ VS++P+ VTVQLTTTKK 
Sbjct: 300 SQVSIEPSWTVQLTTTKK 318 



SEQ ID 4240 (GBS99) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 17 (lane 6; MW 35.7kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 21 (lane 9; MW 60.7kDa). 
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The GBS99-GST fusion product was purified (Figure 197, lane 9) and used to immunise mice. The 
resulting antiserum was used for FACS (Figure 293), which confirmed that the protein is immunoaccessible 
on GBS bacteria. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1384 

A DNA sequence (GBSxl469) was identified in S.agalactiae <SEQ ID 4243> which encodes the amino 
acid sequence <SEQ ID 4244>. Analysis of this protein sequence reveals the following: 

D N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 0503 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

15 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
20 vaccines or diagnostics. 

Example 1385 

A DNA sequence (GBSxl470) was identified in S.agalactiae <SEQ ID 4245> which 
acid sequence <SEQ ID 4246>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 

Likelihood = -9.50 Transmembrane 20- 36 ( 18- 46) 

INTEGRAL Likelihood = -7.64 Transmembrane 48 - 64 ( 42 - 68) 

Likelihood = -3.40 Transmembrane 80- 96 ( 80- 96) 



le Certainty=0. 4800 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB11968 GB:Z99105 alternate gene name: ybbQ-similar to 
hypothetical proteins [Bacillus subtilis] 
Identities = 125/253 (49%), Positives =186/253 (73%), Gaps = 5/253 (1%) 

Query. 27 MDIIIVAVLIYKFIKALAGTKIMSLIQGVILFIIIRFVSEWIGLTTITFLMNQIVTYGVI 86 

+DI++V +IYK I + GTK + L++G+++ +++R S+++GL+T+ +LM+Q +T+G + 
Sbjct: 16 VDILLVWYVIYKLIMVIRGTKAVQLLKGIWIVLVRMASQYLGLSTLQWLMDQAITWGFL 75 

Query: 87 AGWI FAPE IRTGLEKFGRTPQLFTQRSQLSSDE KLVDALVKAVAYMSPRKIGALI S 143 

A ++IF PE+R LE+ GR F RS +E K ++A+ KA+ YM+ R+IGAL++ 
Sbjct: 76 AIIIIFQPELRRALEQLGRGR- -FFSRSGTPVEEAQQKTISAITKAINYMAKRRIGALLT 133 

Query: 144 IERTQTLQEYIATGIPLDADISSELLINIFIPNTPLHDGAVIVKDKKIATACSYLPLSES 203 

IER + +YI TGIPL+A +SSELLINIFIPNTPLHDGAVI+K+ +IA A YLPLSES 
Sbjct: 134 IERDTGMGDYIETGIPLNAKVSSELLINIFIPNTPLHDGAVIMKNNEIAAAACYLPLSES 193 

Query: 204 SSISKEFGTRHRAAIGLSENSDALTVIVSEETGGISVALKGEFLHDLSKDSFEAILRTQL 263 
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ISKE GTRHRAA4G+SE +D+LT+IVSEETGG+SVA G+ +L++++ + +L + 
Sbjct: 194 PFISKELGTRHRM.VGISEVTDSLTIIVSEETG3VSVAKHGDLHRELTEEM.KEMLEAEF 253 



Query: 264 3 

+N + S WY 
Sbjct: 254 KKNTRDTSSNRWY 266 

A related DNA sequence was identified in S.pyogenes <SEQ ID 4247> which encodes the amino acid 
sequence <SEQ ID 4248>. Analysis of this protein sequence reveals the following: 

Possible site: 23 
>>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -6.64 Transmembrane 20 - 36 ( 19 - 40) 
INTEGRAL Likelihood = -6.21 Transmembrane 48- 64 ( 47- 68) 
INTEGRAL Likelihood = -2.07 Transmembrane 76 - 92 ( 76 - 92) 

Final Results 

bacterial membrane Certainty=0 . 3654 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:BAB03984 GB:AP001507 unknown conserved protein [Bacillus halodurans] 
Identities = 117/255 (45%) , Positives = 178/255 (68%) , Gaps = 6/255 (2%) 

Query: 19 PWL-LAVHLLDILIVAYLIYRFIKALTGTKIMSLVQGVIFFLVLRFIAEWIGFTTITYLM 77 

PWL +LDIL+V Y+IY+ I + GT+ + L++G+ L++ 1+ + T+ +++ 

Sbjct: 8 PWLNYLTQILDILWTYVIYKAIMIIRGTRAVQLLKGITVILIVYAISIFFNLRTLGWIV 67 

Query: 78 NQVITYGVIAGWIFTPEIRAGLEKFGRSTQVFLQKQYVSSESAL VDALIKSVAYMG 134 

NQ ITYG++A 4+IF PE+R LE+ GR F + + E + +DA++K+ YMG 
Sbjct: 68 NQAITYGIiAVIIIFQPELRRALEQLGRGR--FFASRTAHESETMKKTIDAIVKASTYMG 125 

Query: 135 PRKIGALIAIEQTQTLQEYIATGIPLNADISSQLLINIFIPNTPLHDGAVIVGQNKIVAA 194 

R+IGALI++E+ + +Y+ TGIP+NA+++S+LLIN FIPNTPLHDGAVI+ + I+AA 
Sbjct: 126 KRRIGALISMERETGMTDYvETGIPMKANLTSELLINTFIPNTPLHDGAVIINNDTILAA 185 

Query: 195 CAYLPLSESKAISKEFGTRHRAAIGLSENSDALTIIVSEETGAISVTRKGQFLHDLSTDE 254 

YLPLSE+ ISKE GTRHRAA+G+SE +D LTI+VSEETG IS+T+ G+ DL ++ 
Sbjct: 186 ACYLPLSENPFISKELGTRHRAALGVSEVTDCLTIWSEETGHISLTKNGELHRDLDEEQ 245 

Query: 255 FETVLRTYLMSNSNV 269 

++L L+S + + 
Sbjct: 246 LRSLLEAELISEAKM 260 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 201/283 (71%) , Positives = 239/283 (84%) , Gaps = 2/283 (0%) 

MDIFSAIDSKFWASIMENPWMILIHLMDIIIVAVLIYKFIKALAGTKIMSLIQGVILFII 60 
M+ S+ID KF S+ +PW++ +HL+DI+IVA LIY+FIKAL GTKIMSL+QGVI F++ 
MNKLSSIDIKFLLSLFADPWLLAVHLLDILIVAYLIYRFIKALTGTKIMSLVQGVIFFLV 6 0 

IRFVSEWIGLTTITFLMNQIVTYGVIAGWIFAPEIRTGLEKFGRTPQLFTQRSQLSSDE 12 0 
+RF++EWIG TTIT+LMNQ++TYGVIAGWIF PEIR GLEKFGR+ Q+F Q+ +SS+ 



Query: 


1 


Sbjct: 


1 


Query: 


61 


Sbjct: 


61 


Query: 


121 


Sbjct: 


121 




181 


Sbjct: 





LVDAL+K+VAYM PRKIGALI+IE+TQTLQEYIATGIPL+ADISS+LLINIFIPNTPLH 
Sbjct: 121 ALVDALIKSVAYMGPRKIGALIAIEQTQTLQEYIATGIPLNADISSQLLINIFIPNTPLH 180 



Query: 241 ALKGEFLHDLSKDSFEAILRTQLIQNQEENSKLAWYNQLLRRK 283 
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KG+FLHDLS D FE +LRT L+ N N L WY ++L K 
Sbjct: 241 TRKGQFLHDLSTDEFETVLRTYLMSN- -SNVTLPWYKKILGGK 281 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1386 

A DNA sequence (GBSxl471) was identified in S.agalactiae <SEQ ID 4249> which encodes the amino 
acid sequence <SEQ ID 425 0>. Analysis of this protein sequence reveals the following: 

Possible site: 25 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -2. SO Transmembrane 33 - 49 ( 33 - 49) 

Final Results 

bacterial membrane Certainty=0 .2041 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm — Certainty=0 .000 0 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1387 

A DNA sequence (GBSxl472) was identified in S.agalactiae <SEQ ID 4251> which encodes the amino 
acid sequence <SEQ ID 4252>. Analysis of this protein sequence reveals the following: 
Possible site: 15 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm -— Certainty=0 .1001 (Affirmative) < suco 

bacterial membrane — Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



A related GBS nucleic acid sequence <SEQ ID 9781> which encodes amino acid sequence <SEQ ID 9782> 
was also identified. 

35 The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC84012 GB:AF080002 UDP-N-acetylmuramyl tripeptide synthetase 
MurC [Heliobacillus mobilis] 
Identities = 143/442 (32%) , Positives = 229/442 (51%) , Gaps = 17/442 (3%) 

40 Query: 12 GKSAHYLLSKMGRGST-YPGSLALKFDKDILDTIAKDYE--IVVVTGTNGKTLTTALTVG 68 

GK+A +L + G G T +PG + + IL +A+ + +WTGTNGKT T+ + 
Sbjct: 2 GKTAIWLNRRFGHGGTSFPGGIGRRVAPQILTALARQLKRGAMVVTGTNGKTTTSKMLAA 61 

Query: 69 ILKFAFGQVVTNPSGANMITGIVSTFLTAKKSKBG--KKIAVLEIDEASLPRITQYIKPS 126 
45 I++++ + N +GAN++ GI + F+ + + ++E+DEA++P++ + ++P 

Sbjct: 62 IVEKSSLTLTHNRAGANLVGGITTAFIDSATIGGSITSDLGIIEVDEATIPQLVREVQPK 121 

Query: 127 LFVFTNIFRDQMDRYGEIYTTYQMILIXSAANAP-QATIIiANGDSPLFNS- -KSVTNPVQF 183 
V TN FRDQ+DR+GE+ T+++ PQ++NDPLSK V + 

50 Sbjct: 122 GVVVTNFFRDQLDRFGELDKTVSLVGEALRLLPVC'SIAVLNADDPLVASLGKDFPGRVLY 181 
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Sbjct: 


182 


FGIDDRSYGAREMLQSAETRFCRLCGHPITYDWFFFGQLGHYRCSHCGFERPEPKIKVTG 241 


Query: 


244 


LTHLTNTSSGFVTDGQ QYNINVG 3LYNIYNA LAAVS /AEYFGVEPSQI KDGFDKSR 299 






+ S F ++ Q ++ G YNIYNALAA++ A + 1+ G R 


Sbjct: 


242 


IQLKGEEGSAFTVETPRGTWQLELSTPGFYKI YNALAAIA3AIRLDLPEKAIRAGLQGYR 3 01 


Query: 


300 


AVFGRQETFTIGN-KKCTLVLIKl<rpVGASQALDMIKliAPYPFSLSVLL^IAlTYADGIDTSW 3 58 






FGR E + + ++ L LIKNP G + + + PL V++N N ADG D SW 


Sbjct- 


302 


TNFGRMERIELEDGRRAFLALIKNPTGCDEVIRTLVQl'IRGPKRLLVl INDNAADGRDISW 361 


Query: 


359 


IWDANFETI- -LTMNIPEIFAGGVRHSEIARRLRVTGYDEKRIK-QADKLQDIMTMIEQQ 415 






+WDA+FE++ + + +F G+R ++A RL TG + 1+ +A+ I + +E 


Sbjct: 


362 


LVfDADFESLEPVYPELRSVFTSGLRGEDMALRLNYTGIPAESIRYEANVESAIRSALEMT 421 


Query: 


416 


ET - EHAYI LATYTAMLEFRE I L 436 






E E YIL TYTA+LE + L 


Sbjct: 


422 


EPGETLYILPTYTALLESKAAL 443 



A related DNA sequence was identified in S.pyogenes <SEQ ID 4253> which encodes the amino acid 
20 sequence <SEQ ID 4254>. Analysis of this protein sequence reveals the following: 

Passible site: 35 

>» Seems to have an uncleavable N-term signal seq 



Final Results 

25 bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

30 . Identities = 343/446 (76%), Positives = 393/446 (87%) 

Query: 1 MKINTALGVAAGKSAHYLLSKMGRGSTYPGSLALKPDKDILDTIAKDYEIWVTG'TNGKT 60 
MK+ T LG+ AGK+A +L+K+GRGSTYPG LAL DKDIL ++KDY+ 1 VWTGTNGKT 
1 MKMKTLLGIIAGKAAQSILTKLGRGSTYPGRLAIACDKDILKDLSKDYDIVVVTGTNGKT 60 

61 LTTALTVGILKEAFGQVVTNPSGANMITGIVSTFLTAKKSKSGKKIAVLEIDEASLPRIT 120 

LTTALTVGILKEAFG+++TNPSGANMITGI STFL AKK KS ++IAVLEIDEASLPRIT 
61 LTTALTVGILKEAFGEIITNPSGANMITGITSTFLAAKKGKSERQIAVLEIDEASLPR1T 120 

121 QYIKPSLFVFTNIFRDQMDRYGEIYTTYQMILDGAAl^PQATILANGDSPLFNSKSVTNP 180 

Y+KPSLFV+TNIFRDQMDRYGEIYTTYQMI+DGA NAP+ATILANGDSP+F+SK + NP 
121 TYLKPSLFVYTNIFRDQOTRYGEIYTTYQKIVDGARNAPKATILANGDSPIFSSKDIVNP 180 



Sbjct: 

35 

Sbjct: 
40 Query: 
Sbjct: 



Query: 181 VQFYGFNTDKHEPRLAHYNTEGILCPKCQAILTYRLl^TYANLGDYTCPNCDFERPNLDYA 240 

VQ+YGF+T KH P+LAHYNTEGILCPKC+ IL YRLNTYANLGD+ C NC F+RP LDY 
Sbjct: 181 VQYYGFDTAKHAPQIAHYNTEGILCPKCEHILQYRLNTYANLGDFVCLNCQFQRPTLDYQ 240 

Query: 241 LTRLTHLTNTSSGFVIDGQQYNINVGGLYNIYNALAAVSVAEYFGVEPSQIKDGFDKSRA 300 

LT LT +T+ SS FVIDGQ Y INVGGLYNI YNALAAVSVAE+ FGV P +IK GF+KS+A 
Sbjct: 241 LTELTAITHQSSEFVIDGQNYKINVGGLYNIYNALAAVSVAEFFGVSPEKIKAGFNKSKA 300 



Query: 301 VFGRQETFTIGNKKCTLVLIKNPVGASQALDMIKLAPYPFSLSVLLNANYADGIDTSWIW 360 

VFGRQETFT+G+K CTL+LIKNPVGASQAL+MI+LA YPFSLSVLLNANYADGIDTSWIW 
Sbjct: 301 VFGRQETFTVGDKSCTLILIKNPVGASQALEMIQLADYPFSLSVLLNANYADGIDTSWIW 360 

Query: 361 DANFETILT^IPEIFAGGVRHSEIJiRRLRVTGYDEKRIKQADKLQDIMTMIEQQETEHA 420 

DANFE I 11 I EI AGGVRHSEIARRLRVTG+D+ +IKQA+KL+ 1+ IE+QE +HA 
Sbjct: 361 DANFELITQMPITEINAGGVRHSEIARRLRVTGFDDTKIKQAEKLEQIIETIEKQEAKHA 420 



YILATYTAMLEFR +LA+ + + KEM 
Sbjct: 421 YILATYTAMLEFRSLLADRHWEKEM 446 
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Based on this analysis, it was predicted that these proteins and their epitopes could be useful anl 
vaccines or diagnostics. 

Example 1388 

A DNA sequence (GBSxl473) was identified in S.agalactiae <SEQ ID 4255> which encodes £ 
acid sequence <SEQ ID 4256>. Analysis of this protein sequence reveals the following: 



■ Final Results 

bacterial cytoplasm Certainty=0. 3010 (Affirmative) 

bacterial membrane Certainty=0 . 0000 (Not Clear) < 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < 



The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC84011 GB:AF080002 cobyric acid synthase CobQ [Heliobacillus 
mobilis] 

Identities = 89/250 (35%) , Positives = 129/250 (51%) , Gaps = 9/250 (3%) 





11 


Sbjct: 


2 




71 


Sbjct: 






130 


Sbjct: 


122 




186 


Sbjct: 


181 


Query: 


243 


Sbjct: 


241 



GNNKEDGTEGVHYKNVFGSYFHGPILSRNANLAYRLVATALRNKYG — -KEIVLPSYEEI 242 
GHN +D EG YKN G+Y HGP+L +N LA L++ AL +YG + ++E 



A related DNA sequence was identified in S. pyogenes <SEQ ID 425 7> which 
sequence <SEQ ID 4258>. Analysis of this protein sequence reveals the following: 

Possible site: 32 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .2586 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 197/260 (75%) , Positives = 224/260 (85%) 

Query: 1 MTYTSLKSPTTKDYKYTLOTAHLYGNLLNTYGDNGNILMMKYVC-EKLGCQMTFDIVSLED 60 

MTYTSLKSP +DY Y L +AHLYGNL+NTYGDNGNILM+KYV EKLG ++T DIVS+ D 
Sbjct: 1 MTYTSLKSPENQDYIYDLTIAHLYGNLMNTYGDNGNILMLKA/AEKLGARvTVDIVSIND 60 

Query: 61 RFDPNYYQmFFGGGQDYEQAIVARDLPSKKEDlNKFIQTOGVVLAICGGFQLLGQYYIQ 120 

F+ + Y + FFGGGQDYEQ+IVA+DLPSKK + +1 NN WLAI CGGFQLLGQYY+Q 
Sbjct: 61 TFEQDDYDIVFFGGGQDYEQSIVAKDLPSKPCAALADYIANNKVVIiAICGGFQLLGQYYVQ 120 
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Query: 181 VIYGNGNNKEDGTEGVHYKWFGSYFHGPILSRNAKLAYRLVATALRNKyGKEIVLPSYE 240 

V+YGNGNNKED TEGVHYKNV+GSYFHGPILSRN NLAYRLV TAL+ KYG I LPSY+ 
Sbjct: 181 VWGNGNNKEDQTEGVHYKNVYGSYFHGPILSRNVM^YELVTTALKKKYGSAISLPSYD 240 

Query: 241 EILSLEIPEEYGDVKSKADF 260 

+IL EI EEY D+KSKA F 
Sbjct: 241 DILKQEITEEYADLKSKASF 260 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 



15 Example 1389 

A DNA sequence (GBSxl474) was identified in S.agalactiae <SEQ ID 4259> which encodes the amino 
acid sequence <SEQ ID 4260>. Analysis of this protein sequence reveals the following: 

Possible site: 21 

>>> Seems to have no N- terminal signal sequence 

-- — Final Results 

bacterial cytoplasm — Certainty=0 . 1701 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAB04402 GB:AP001509 lipoate-protein ligase [Bacillus halodurans] 
Identities = 153/316 (48%) , Positives = 212/316 (66%) , Gaps = 3/316 (0%) 





10 


DPAYNVArjEAYAFQKLTDIDEIFIL-WINEPAIIIGRHQNTIQElNKEFIDKNGIHWRR 


68 






DP N+A+E YA + L DI+E ++L +INEP+IIIGR+QNTI+EIN E+++ NGIHWRR 




Sbj ct: 


11 


DPRINLAIEEYALKNL-DINETYLLFYINEPSIIIGFJJQNTIEEINTEYVESNGIHVVRR 


69 




69 


LSGGGAVYHDLNNLNYTIISNNTQEGAFDFQTFSKPVIDTIAKLGVKAEFTGRNDL'-EIN 








LSGGGAVYHD NLN++ 1+ + E +FQ F+ PVI LAKLGV AE GRND+ + 




Sbjct: 


70 


LSGGGAVYHDHGNI^FSFITKDDGESFSNFQKFTDPVIKALAKLGVTAELKGRNDIIASD 


129 


Query: 


128 


GQKFAC^AQAYYKGR^HHGCLLFDVDMSVLGQALK\rSKDKIESKGIKSVRARVTNIVDH 


187 






G+K +GNAQ KGRM HG LLFD ++ + AL VSKDKIESKGIKS+R+RV NI + 




Sbjct: 


13 0 


GRKISGNAQFSTKGRMFSHGTLLFDSEIDHWSALNVSKDKIESKGIKSIRSRVANISEF 


189 


Query: 


108 


LSDKITVQEFSDAILAQMKEEYPEMDEYVLSDAELSEIQAMRDNQFATWDWTYGKAPEYT 247 






L++KI++ +F +L + + + EY L+ + +EI + ++ WDW YGK+P + 




Sbjct: 


190 


LTEKIS I DQFRSLLLES I FDGQANI QE YKLTADDWAE ~ HELSKERYQNWDWNYGKSPAFN 


249 


Query: 


248 


IERGTOYPAGKITTYANVENSTIKSVKIFGDFFGVKPVDDIEKMLEGVRYDYKDVIAALK 307 




250 


++ R+P G I V+ TI+ KIFGDFFG V D+E L G+RY+ D+ AL 


309 


Sbjct: 
Query: 


308 


TVDTSQYFSRMTPEEI 323 








VD YF ++ ++I 




Sbjct: 


310 


DVDVKIYFGQVEKDDI 325 





A related DNA sequence was identified in S.pyogenes <SEQ ID 4261> which encodes the amino acid 
55 sequence <SEQ ID 4262>. Analysis of this protein sequence reveals the following: 

Possible site: 23 

»> Seems to have no N-terminal signal sequence 

Final Results 

60 bacterial cytoplasm --- Certainty=0. 1271 (Affirmative) < suco 



20 



25 
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An alignment of the GAS and GBS proteins is shown below. 

Identities = 249/328 (75%), Positives = 292/328 (88%) 

Query: 1 MKYIVNTSNDPAYNVALEAYAFQKLTDIDEIFILWINEPAIIIGRHQNTIQEINKEFIDK 60 

MKYIVN S++PA+N+ALEAYAF++L + DE+FILWINEPAIIIG+HQNTIQEINKE+ID+ 
Sbjct: 1 MKYIVNKSHNPAFNIALEAYAFRELVEEDELFILWINEPAIIIGKHQNTIQEINKEYIDE 60 

Query: 61 NGIHvVM^LSGGGAVYHDLNNLNYTIISNNTQEGAFDFQTFSKPVIDTIAKLGVTCAEFTG 120 

+GIHWRRLSGGGAVYHDLNNLNYTIISN T EGAFDF+TFS+PVI TLA LGV A FTG 
Sbjct: 61 HGIHVWRLSGGGAVYHDl^LOTTIISNKTAEGAFDFKTFSQPVIATLADLGVTANFTG 120 

Query: 121 RNDLEINGQKFAGNAQAYYKGRMMHHGCLLFDVDMSi/LGQALKVSKDKIESKGIKSVRAR 180 
RND+EI+G+K GNAQAYYKGRMMHHGCLLFDVDM+VLG ALKVSKDKIESKG+KSVRAR 
. Sbjct: 121 RNDIEIDGICKICGNAQAYYKGRMMHHGCLL?DVDMT'7LGDALKVSKDKIESKGVKSVRAR 180 



Query: 241 GKAPEYTIERGVRYPAGKITTYANVENSTIKSVKIFGDFFGVKPVDDIEKMLEGVRYDYK 300 

GKAPEYTIER VRYPAGKI +T+ANVENS IK++KT+GDFFG+K V DIE +L G +Y+Y+ 
Sbjct: 241 GKAPEYTIERNVRYPAGKISTFANVENSIIKKLKIYGDFFGIKDVQDIENLLIGCKYEYR 300 

Query: 301 DVLAALKTVDTSQYFSRMTPEEITKMV 328 

DV LKT+DT+QYFSRMT EE+ KAIV 
Sbjct: 301 DVFERLKTIDTTQYFSRMTVEEVAKAIV 328 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1390 

A DNA sequence (GBSxl475) was identified in S.agalactiae <SEQ ID 4263> which encodes the amino 
acid sequence <SEQ ID 4264>. Analysis of this protein sequence reveals the following: 

Possible site: 46 
, »> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -1.70 Transmembrane 294 - 310 ( 294 - 312) 



40 Final Results 

bacterial membrane Certainty=0. 16 B0 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

45 The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAA21748 GB:L31844 dihydrolipoamide dehydrogenase [Clostridium 
magnum] 

Identities = 229/589 (38%) , Positives = 339/589 (56%) , Gaps = 25/589 (4%) 

50 Query: 1 mFDVIMPKLGVDMQEGEILEWKKNEGDTVNEGDVLLEIMSDKTNMEIEAEDTGVLLKIV 60 

MA V+MPKLG+ M EG ++ WKK EGD V G++L E+ +DK E+E+ D G++ K++ 
Sbjct: 1 MAKIVVMPKLGLTMTEGTLVTWKKAEGDQVKVGEILFEVSTDKLTNEVESSDEGIWKLL 60 

Query: 61 HQAGDWPVTEVIAYIGEEGEEVGTSSPSADATITAEDGQSVSGPAAPSQETVAAATPKE 120 
55 GDW +A IG E++ + +G S +A +T A PK+ 

Sbjct: 61 VNEGDWECLNPVAI IGSADEDISSLL NGSSEGSGSAEQSDTKA- - - PKK 107 

Query: 121 ELAADEY- -DIVWGGGPAGYYAAIRGAQLGGKIAIVEKTEFGGTCLNVGCIPTKTYLKN 178 
E+ A + ++W+GGGP GY AAIR AQLG K+ ++EK GGTCLNVGCIPTK L + 
60 Sbjct: 108 EVEAWGGDNLWIGGGPGGYVAAIRAAQLGAKVTLIEKESLGGTCLNVGCIPTKVLLHS 167 
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Query: 179 AEILDGLIWAaGRGIHl^STNYAIDMDKWAFKNSVVKTLTGGTOGLLK^KVEIFNGLG 23 8 

+++L +K GI++ + ++ K V+K L GV GLL NKV++ G 

Sbjct: 168 SQLLTEMKEGDKLGIDIEGS-IVVNWKKIQKRKKIVIKKLVSGVSGLLTCNKVICVIKiGTA 226 

Query: 239 QVNPDKSWIGDK VIKGRNVVLATGSKVSRINIPGIESPLVLTSDDILDLREIPK 293 

+ ++++ + + N ++ATGS I G + V+ S L Ii P+ 

Sbjct: 227 KFESKDTILVTKEDGVAEICVNFDNAIIATGSMPFIPEIEGNKLSGVIDSTGALSLESNPE 286 

Query: 294 SIAVMGGGWGIELGLVWASYGVDVTVIEMADRIIPAMDKEVSLELQKILAKKGMKIKTS 353 

S+A++GGGV+G+E ++ S G V++IEM I+P MD+E+S + L + G+ I + 
Sbjct: 287 SIAIIGGGVIGVEFASIFNSLGCKVSIIEMLPHILPPMDREISEIAKAKLIRDGININNN 346 

Query: 354 VGVSEIVEANNQLTLKL--NNGEEW-ADKALLSIGRVPQMNGLENLEPELEMERGRIKV 410 

V+ I + 4 h + + GEE + +K L+++GR + GL+ + ++ E G I V 
Sbjct: 347 CKVTRIEQGEDGLKVSFIGDKGEESIDVEKVLIAVGRRSNIEGLDVEKIGVKTEGGSIIV 406 

Query: 411 NAYQETS I PGI YAPGDVNGTRMLAHAAYRMGEVAAENALGGNKRKAHLDFTPAAVYTHPE 470 

N ET++ GIYA GD G MLAH A G VAAEN +G NK K PA VYT PE 

Sbjct: 407 l^KMETNVEGIYAIGDCTGKIMIAHVASDQGWAAENIMGQNK-KNIDYKTVPACVYTKPE 465 

. Query: 471 VAMVGMTEEQAREQYGDILVGICNSFTGNGRAIASNEAHGFVKVIAEPKYKEILGVHIIGP 530 
+A VG+TEEQA+E+ D VGK NG+++ NE G +K+I + KY+EILGVHI+GP 

Sbjct: 466 LASVGLTEEQAKEKGIDYKVGKFQLAANGKSLIMNETGGVIKIITDKKYEEILGVHILGP 525 

Query: 531 AAAELINEASTIMENELTVYDVAQSIHGHPTFSEVMYEAFLDVLGEAIH 579 

A +LI EA+ + E T+ ++ ++H HPT E M EA L V +AIH 
Sbjct: 526 RATDLITEAALALRLEATLEEIITTVHAHPTVGEAMKEAALAVNNQAIH 574 

A related DNA sequence was identified in S.pyogenes <SEQ ID 1819> which encodes the amino acid 
sequence <SEQ ID 1820>. Analysis of this protein sequence reveals the following: 

Possible site: 50 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -1.70 Transmembrane 297 - 313 ( 297 - 315) 

Final Results 

bacterial membrane Certainty=0 . 1680 (Affirmative) < suco . 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 497/591 (84%) , Positives = 538/591 (90%) , Gaps = 10/591 (1%) 



Query: 
Sbjct 

Sbjct 
Query, 
Sbjct: 

Sbjct: 

Sbjct 

Sbjct 



MAFDVIMPKLG VDMQEGE ILEWKKNEGDTVNEGDVLLE I MSDKTNME I EAEDTGVLLKI V 6 0 
MA ++IMPKLGVDMQEGEI+EWKK EGDTVNEGD+LLEIMSDKTNME+EAED+GVLLKI . 
MAVE I IMPKLGVDMQEGEI IEWKKQEGDTVNEGD I LLE I MSDKTNMELEAEDSGVLLKI T 6 0 



61 HQAGDWPVTEVIAYIGEEGSEVGTSSPSA DATITAEDGQS--VSGPAAPSQETVAA 115 

QAG+ VPVTEVI YIG EGE V SSP+A + T ED ++ + P AP+Q A+ 
61 RQAGETVPVTEVIGYIGAEGESVEVS3PAASDVNVARTTEDLEA^GLEVPKAPAQ--AAS 118 

116 ATPKEELAADEYDIVWGGGPAGYYAAIRGAQLGGKIAIVEKTEFGGTCLNVGCIPTKTY 175 

A PK LA DEYDI+WGGGPAGYYAAIRGAQLGGKIAIVEK+EFGGTCLNVGCIPTKTY 
119 AAPKAALADDEYDIIWGGGPAGYYAAIRGAQLGGKIAIVEKSEFGGTCLNVGCIPTlCrY 178 



236 GLGQVNPDKSWIGDKVIKGRl^WLATGSKVSRIIIIEGIESPLVLTSDDILDLREIPKSL 295 

GLGQVNPDK+V IG + IKGRNV+LATGSKVSRINIFGI+S LVLTSDDILDLRE+PKSL 
239 GLGQTOPDKTVTIGSQTIKGRNVIIiA.TGSKVSRINIPGIDSKLVLTSDDILDLREMPKSL 298 

296 AvMGGGWGIELGLWASYGVDVTVIEMADRIIPAMDKEVSLELQKILAKKGMKIKTSVG 355 

AVMGGGWGIEIGLWASYGVDVTVIEMADRIIPAMDKEVSLELQKIL+KKGMKIICrSVG 
299 AVMGGGWGIELGLVWASYGVDVTVIEMADRIIPAMDKEVSLELQKILSKKGMKIKTSVG 358 
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Query: 356 VSEIVEAmQLTLKLNNGEEWADKaLLSIGRVPQI>lNGLENLEPELEMERGRIKVNAYQE 415 

VSEIVEAMNQLTLKI,NNGEEWA+KM.LSIGRV QMNGLENL LEM+R RIKVN YQE 
Sbjct: 359 VSEIVEAiTOQLTLKLNNGEEVVAEK&LLSIGRVSQMNGLENL- -NLEMDRNRIKVNDYQE 416 

Query: 416 TS I PGIYAPGDWGTRMLAHAAYRMGEVAAENALGGN- KRKAHLDFTPAA VYTHPEVAMV 474 

TSIPGIYAPGDVNGT+MLAHAAYRMGEVAAENA+ GN RKA+L +TPAAVYTHPEVAMV 
Sbjct: 417 TSIPGIYAPGDWGTKMLAHAAYRMGEVA^ENA^GNTTRI<A1^KYTPAAVYTHPEVAMV 476 

10 Query: 475 GMTEEQAREQYGDILVGKNSFTGNGPAIASNEAHGPVKVIAEPKYKEILGVHIIGPAAAE 534 

G+TEEQAREQYGD+L+GKNSFTGNGRAIASNEAHGFVKVIA+ KY EILGVHIIGPAAAE 
Sbjct: 477 GLTEEQAREQYGDVL IGKNS FTGNGRAIASKEAHGFVKVIADAKYHEI LGVH I IGPAAAE 536 

Query: 535 LINEASTIMENELTVYDVAQSIHGHPTFSEVMYEAFLDVLGEAIHNPPKRK 585 
15 + INEA+TIME+ELTV ++ SIHGHPTFSEVMYEAF DVLGEAI HNPPKRK 

Sbjct: 537 MIM5KATIMESELTVDELLLSIHGHPTFSEVMYEAFADVLGEAIHNPPKRK 587 

SEQ ID 4264 (GBS681) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 165 flane 2; MW 68.3kDa) and in Figure 188 (lane 10; MW 68kDa). 

20 Purified GBS681-His is shown in Figure 240, lane 5-6. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1391 

A DNA sequence (GBSxl476) was identified in S.agalactiae <SEQ ID 4265> which encodes the amino 
25 acid sequence <SEQ ID 4266>. This protein is predicted to be dihydrolipoamidc acctyltransferase. Analysis 
of this protein sequence reveals the following: 

Possible site: 46 

>>> Seems to have no N-terminal signal sequence 

30 Final Results 

bacterial cytoplasm Certainty=0 . 44 S 6 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

35 The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAB04497 GB:AP001509 dihydrolipoamide S-acetyltransf erase 
[Bacillus halodurans] 
Identities = 187/462 (40%), Positives = 266/462 (57%), Gaps = 26/462 (5%) 

mVEIIMPKLGVDMQEGEILEWKKQVGIDvvNEGDVIjLEIMSDKTNMEIEAEDSGVLLKIT 60 
MA EI MPKL MQEG +L+W K+ GD V G+ L EIM+DK N+E+EA + G LLK 
MAKEIFMPKLSSTMQEGTLLQWFKEEGDRVEVGEPLFEIMTDKINIEVEAYEEGTLLKRY 60 







Sbjct: 




Query: 


61 


Sbjct: 


61 


Query: 


121 


Sbjct: 






181 


Sbjct: 


176 


Query: 


241 


Sbjct: 


219 



+G D +PV IGYIG 



GK+ + DV A 



K ++K M +S +AP T+ +IDM+ + +R +L+ I +TG ++S+T+++ AV 
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Query: 301 LMKPEHRYLNASLINDAQEIELHNFVNIGIAVGLDDGLIVPWHHADQMSLSDFVIASKD 360 

LM H +NAS + EI H V+IG+AV ++ GL+VPW + D+ L+ K 
Sbjct: 279 LMS- -HPTINASFFEN- -EIVYHEDVHIGLAVAVEGGLWPWKHVDKKGLAQLTNECKT 334 

Query: 361 VIKKTQEGKLKSAEMSGSTFSITNLGMFGTKTFNPIINQPNSAILGVGATIPTPTWDGE 420 

V ++ +L MSG TF+I+-NLGM+ F P+INQP SAILGVG P +DG+ 
Sbjct: 335 VAMAARDWLSQEMMSGGTFTISNLGMYAIDVFTPVINQPESAILGVGRIQEKPVGIDGQ 394 

Query: 421 IVARPIMAMCLTIDHRIVDGMNGAKFKVDLKNLMEMPFGLIiI 462 

I RP+M h+ DHR++DG A F+ D+K+++E PF LL+ 
Sbjct: 395 IELRPMMTASLSFDHRVIDGAPAAAFLTDVKSMLEQPFQLLM 436 

A related DNA sequence was identified in S.pyogenes <SEQ ID 4267> which encodes the amino acid 
sequence <SEQ ID 4268>. Analysis of this protein sequence reveals the following: 

o N- terminal signal sequence 

. Final Results 

bacterial cytoplasm Certainty=Q .4774 (Affirmative) < suco 

bacterial membrane Certainty=0 .0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 354/473 (74%), Positives = 390/473 (81%), Gaps = 15/473 (3%) 

Query: 1 MAVEIIMPKLGVDMQEGEILEWKKQVGDVVNEGDVLLEIMSDKTNMEIEAEDSGVLLKIT 60 

MA EIIMPKLGVDMQEGEI+EWKKQ GD VNEGD+LLEIMSDKTNME+EAEDSGVLLKIT 
Sbjct: 1 MAFEIIMPI^LGVDMQEGEIIEWKKQEGDTVNEGDILLEIMSDK'TNMELETAEDSGVLLKIT 60 

Query: 61 HGNGDWPVTETIGYIGAEGEEVTEASSSENTS VEENATQVTSEPEKVEETSEPS 115 

GD VPVTE IGYIGAEGE V +SSE T+ +A + E V + P 

Sbjct: 61 RQAGDTVPVTEVIGYIGAEGESVDTIASSEKTTEIPVPASADAGPAVAPKENVASPA-PQ 119 

Query: 116 VPAAT SGEKVRATPAARKLAREMSIDLALVSGTGANGRVHREDVENFKGAQPRITP 171 

V A +G KVRATPAARK A EM IDh V GTG GRVH+EDVENFKGAQP+ +P 
Sbjct: 120 VAATAIPQGNGGKVRATPAARKAAAEMGIDLGQVPGTGPKGRVHKEDVENFKGAQPKASP 179 

Query: 172 LARRIAEDQGVTJIAEITGSGIRGKIVKNDVLAAMSPQAAEAPVETKATPTTEEK- -QLPE 229 

LAR+IA D+G+D+A ++G+G GK++K D++A + A P E KA EEK LPE 
Sbjct: 180 LARKIAADKGIDtiATVSGTGFNGKVMKEDIMAILE AAKPAEAI^APAAKEEKWDLPE 236 

Query: 230 GVEVIKMSAMRKAISKGMTNSYLTAPSFTLNYDIDMTEMMALRKKLIDPIMAKTGLKVSF 289 

GVE MSAMRKAISKGMTNSYLTAP-rFTLNYDIDMTEM+ALRICKLIDPIMAKTGLKVSF 
Sbjct: 237 GVEHKPMSAMRKAISKGMTNSYLTAPTFTLNYDIDMTEMIALRKKLIDPIMAKTGLKVSF 296 

Query: 290 TDLIGMAWKTLMKPEHRYLNASLINDAQEIELHNF\1IIGIAVGLDDGLIVPWHNADQM 349 

TDLIGMAWKTLMKPEH Y+NASLINDA +IELH FVN+GIAVGLDDGLIVPV+H A++M 
Sbjct: 297 TDLIGMAVWTLMKPEHEYMNASLINDANDIELHRrATCLGIAVGLDDGLIVPVIHGANKM 356 

Query: 350 SLSDFVIASKDVIKKTQEGItLKSAEMSGSTFSITNLGMFGTKTFNPIINQPNSAILGVGA 409 

LSDFV+ASKDVIKK Q GKLK+AEMSGSTFSITNLGMFGTKTFNPIINQPNSAILGVGA ■ 
Sbjct: 357 CLSDFVLASKDVIKKAQTGKLKAAEMSGSTFSITNLGMFGTKTFNPIINQPNSAILGVGA 416 

Query: 410 TIPTPTWDGEIVARPIMAMCLTIDHRIVDGMNGAKFMVDLKNOIENPFGLLI 462 

TIPTPTWDGEIV+RPIMAMCLTIDHR+VDGKKGAKFtlVDLK LMENPF LLI 
Sbjct: 417 TIPTPTVVDGEIVSRP1MAMCLTIDHRLVDGMNGAKFMVDLKKLMENPFELLI 469 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
60 vaccines or diagnostics. 
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Example 1392 

A DNA sequence (GBSxl477) was identified in S.agalactiae <SEQ ID 4269> which encodes the amino 
acid sequence <SEQ ID 4270>. This protein is predicted to be acetoin dehydrogenase (TPP-dependent) 
beta chain (pdhB). Analysis of this protein sequence reveals the following: 



- Final Results 

bacterial cytoplasm Certainty=0 . 1267 (Affirmative! 

bacterial membrane Certainty=0 . 0000 (Not Clear) - 

bacterial outside — Certainty=0 . 0000 (Not Clear) . 



A related GBS nucleic acid sequence <SEQ ID 9779> which encodes amino acid sequence <SEQ ID 9780> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

sGP:BAB04496 GB:AP001509 acetoin dehydrogenase (TPP-dependent) beta 
chain [Bacillus halodurans] 
Identities = 189/319 (59%), Positives = 249/319 (77%), Gaps = 1/319 (0%) 



P T ++K LLK++I DNNPVIF E K Y K V 



: SVKKT +LI+V++A K GGF GEIA+++AESEAFDYLD PI F 



+P V I +A+ 4- +N 
lAIPQVPDIIEAVKETLN 325 

A related DNA sequence was identified in S.pyogenes <SEQ ID 427 1> which encodes the amino acid 
45 sequence <SEQ ID 4272>. Analysis of this protein sequence reveals the following: 

Possible site: 18 
»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -0.00 Transmembrane 81 - 97 ( 81 - 97) 

50 , Final Results 

bacterial membrane Certainty=0 . 1001 (Affirmative) < suco 

bacterial outside Certaiiity=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

55 The protein has homology with the following sequences in the databases: 

>GP:BAB04496 GB:AP001509 acetoin dehydrogenase (TPP-dependent) beta 
chain [Bacillus halodurans] 
Identities = 187/319 (58%), Positives = 244/319 (75%), Gaps = 1/319 (0%) 



Query: 


11 


Sbjct: 


8 


Query: 


71 

68 


Sbjct: 


131 


Sbjct: 


128 




191 


Sbjct: 




Query: 


251 


Sbjct: 


24 7 




311 


Sb j ct : 


307 
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Query: 11 EAVNLAMTEEMRKDENIFLMGEDVG^/YGGDFGTSVGM:EEFG?KRVKDTPISEAAISGAA 70 

EA+ AMT EMRK+E++F++GED+GVYGG FG 4 GMIEEFG +RV++TPISEAAISG A 
Sbjct: 8 EAIREAMTLEMRKNEDVFILGEDIGVYGGAFGVTRGMIEEFGSERVRNTPISEflAISGTA 67 

Query: 71 IGAAITGLRPIVDVTFMDFLTI^DAIVNNGAKNWYMFGGGLITPVTFRVASGSGIGSAA 130 

IGAA+TG+RPI+++ F DF+TI MD +VN AK YM+GG P+ R +GSG G+AA 
Sbjct: 68 IGAALTGMRP ILELQFSDFIT IAMDJOTVNpAAKLRYJTYGGKAKVPMVLRTPAGSGTGAAA 127 

Query: 131 QHSQSLEAWLTHIPGIKWAPGNANDAKGDLKSAIRDHNIVLFMEPKALYGK3CEEVNQDP 190 

QHSQSLEAW+THIPG+KW P A DAKGLLK+AI DUN V+F E K Y K V ++ 
Sbjct: 128 QHSQSLEAWMTHI PGLKWQPATAYDAKGLLKAAIDDNN PVI FYEHKLCYRTKCHVPEE - 186 

Query. 191 DFYIPLGKGDIKREGTDLTIVSYGRMLERVLQAAEEVAADGINVEWDPRTLIPLDKELI 250 

++ IPLGK D+KR+GTD+T+V+ M+ + L+AA E+ +GI+VEV+DPRTL+PLD+E I 
Sbjct: 187 EYSIPLGKADVKRKGTDVTWATAVI>IVHKALEAAVELEKEGISVEVIDPRTLVPLDEETI 246 

Query: 251 IESVJCICTGKLMLVNDAYKTGGFIGEIATMITESEAFDYLDHPIVRLASEDVPVPYARVLE 310 

I SVKKT +L++V++A K GGF GEIA4-+I ESEAFDYLD PI RL + VP+PY LE 
Sbjct: 247 IRSVKKTSRL I WHEAVKRGGFGGE I AS 1 1 AES EAFDYLDAPI KRLGGKPVP I PYNPTLE 306 



Query: 311 QAI LPDVEKI KAAI VKMAN 329 

+A +P V I A+ 4- N 
Sbjct: 307 RAAI PQVPD I IEAVKETLN 325 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 286/331 (86%) , Positives = 310/331 (93%) 

Query: 1 MSETKVMALREAINVAI-ISEEMRKDEKVFI.MGEDVGVYGGDFGTSVGMLEEFGAKRVRDTP 60 

MSETK+MALREA+N+AM+EEMRKDE +FLMGEDVGVYGGDFGTSVGM+EEFG KRV+DTP 
Sbjct: 1 MSETKLMALREAWLAMTEEMRKDENIFLMGEDVGVYGGDFGTSVGMIEEFGPKRVKDTP 60 



Query: 
Sbjct: 


61 
61 


ISEAAIAGSAIGAAQTGLRPIVDLTFMDFVTIAMDAIWOJ3AKTNYMFGGGLSTPVTFRV 
ISEAAI+G+AIGAA TGLRPIVD+TFMDF+TI MDAIVN GAK NYMFGGGL TPVTFRV 
ISEAAISGAAIGiAAITGLRPIVDVTFMDFLTIMMDAIVNHGAKlWnyiFGGGLITPVTFRV 


120 
120 


Query: 
Sbjct: 


121 
121 


ASGSGIGSAAQHSQSLFAWLTHIPGLKVVAPGTVNESKALLKSSILDNNPVIFLEPKALY 
ASGSGIGSAAQHSQSLEAWLTHIPG+KTOAPG N++K LLKS+I DNN V+F+EPKALY 
ASGSGIGSAAQHSQSLEaWLTHIPGIKWAPGKANDAKGLLKSAIRDNNIVLFMEPKALY 


180 




181 


GKKEEVNMDPDFYIPLGKGDIKREGTDLTIVSYGRMLERVMQAAEEVAEEGIWVEWDPR 
GKKEEVN DPDFYIPLGKGDIKREGTDLTIVSYGRMLERV+QAAEEVA +GINVEWDPR 


240 


Sbjct: 


181 


GKKEEVNQDPDFYIPLGKGDIKREGTDLTIVSYGRMLERVLQAAEEVAADGINVEWDPR 


240 


Query: 
Sbjct: 


241 
241 


TLIPLDKELIIDSVXKTGKLILVWDAYKTGGFTGEIATMVAESEAFDYLDHPIVRLASED 
TLIPLDKELII+SVKKTGKL+LVNDAYKTGGF GEIATM+ ESEAFDYLDHPIVRLASED 
TLIPLDKELIIESVKKTGKLMLWDAYKTGGFIGEIATMITESEAFDYLDHPIVRLASED 


300 
300 


Query: 
Sbjct: 


301 
301 


VPVPYSRVLEQGILPDVAKIKDAIYKWNKG 331 
VPVPY+RVLEQ ILPDV KIK AI K+ NKG 
VPVPYARVLEQAI LPDVEKI KAAI VKMANKG 331 





Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1393 

A DNA sequence (GBSxl478) was identified in S.agalactiae <SEQ ID 4273> which encodes the amino 
acid sequence <SEQ ID 4274>. Analysis of this protein sequence reveals the following: 

Possible site: 48 

>» Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -3.03 Transmembrane 161 - 177 ( 161 - 178) 

Final Results 

bacterial membrane Certainty=0.2211(Affirmative) < suco 



WO 02/34771 



PCT/GB01/04789 



bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9777> which encodes amino acid sequence <SEQ ID 9778> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAB04495 GB:AP001509 acetoin dehydrogenase (TPP-dependent) alpha 
chain [Bacillus halodurans] 
Identities = 148/317 (46%) , Positives = 214/317 (66%) , Gaps = 1/317 (0%) 



Query: 
Sbjct 
Query: 
Sbjct: 
Query: 
Sbjct 
Query: 
Sbji 

Sbjct: 

Query: 
Sbj 



8 LSKEQHLDMFLKMQRIRDVDMKENKLVRRGFVQSKTHFSVGEEAASVGAIQDLTDSDIIF 67 

+++++ +D+F 4M IR + K 4+ +G + G TH +VG+EA++VG+I L 4- D + 
10 MTEKKLVDLFKQWLIRYFEEKVDEFFAKGMIHGTTHLA.VGQEASAVGSIAVLEERDKLT 69 



3 HRGHG IAKG D+ M AEL G+ TG KG+GGSMH+A++E+GN G NGIVGGG+++ 



? GD A+NEGSFHE+VMLA++W LPV+F NN+YG+S 



248 GHSTADAGVYRTKEEVDSWKAKDPVKRYRAYLIEraiATEEEIAAIEAQVIKETOEGVKF 3 07 
GHS +DA YRT+EE W+ KDP+ R RA L++ I TEEE +1+ + +++E+ V+F 

249 GHSKSDAKKYRTREEEKEWREKDPIARLRATLVKEGIVTEEEADSIQEEAKQKIEDSVQF 308 

308 AEES PFPDMS VAFEDVF 324 
A SP P++ EDV+ 

309 ARNSPEEEIESLLEDVY 325 



A related DNA sequence was identified in S.pyogenes <SEQ ID 4275> which encodes the amino acid 
sequence <SEQ ID 4276>. Analysis of this protein sequence reveals the following: 

d N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 3502 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 244/326 (74%) , Positives = 278/326 (84%) 

MEVTWIVTLSKEQHLDMFLKMQRIRDVDMKRNJa,TORGFVQGMTHFSVGEEaASVGAIQDL 6 0 
ME MVT+SKEQHLDMFLKM+RIR+ D + NKLVRRGFVQGMTHFSVGEEAA+VGA+ L 
MEAEMVTVSKEQHLDMFLKMERIREFDSRINKLVRRGFVQGMTHFSVGEEAANVGAVAHL 6 0 

TDSDIIFSNHRGHGQTIAKGIDIGGMFAEIAGKATGTSKGRGGSMHLAWLEKGNYGTNGI 120 
+ DIIFSNHRGHGQ+IAK +D+ M AELAGKATG SKGRGGSMHLA+ EKGNYGTNGI 





1 


Sbjct: 


1 




61 


Sbjct: 


61 




121 


Sbjct: 


121 




181 


Sbjct: 


181 



VGGGYALAVGAALTQQY+GT+NI +AFSGD ATNEGSFHESVN+AA W LPVIFFIINNR 



YGIS I +T PHLY RA+AYG+PG Y EDGND+MAVYE M + + +VR GNGPAIVE 
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Query: 241 VESYRWFGHSTADAGVYRTKEETOSWKAKDPVKRYRAYLIENEIATEEELftAIEAQVIKE 300 

VESYRWFGHSTADAG YRTKEEVD WK KDP+ +YR YL IAT++EL AI+AQV KE 
Sbjct: 241 VESYRWFGHSTADAGKYRTKEEVDEWKEKDPMIKYRTYLTSEGIATDDELDAIQAQVKKE 3 00 

Query: 301 VEEGVKFAEESPFPDMSvAFEDVFVD 326 

V++ +FA+ SP P++SVAFEDV+VD 
Sbjct: 301 VDDAYEFAQNSPDPELSVAFEDVWVD 326 

A related GBS gene <SEQ ID 8797> and protein <SEQ ID 8798> were also identified. Analysis of this 
protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 10 
McG: Discrim Score: , -14.75 
GvH: Signal Score (-7.5): -4.24 

Possible site: 48 
>>> Seems to have no N-terminal signal sequence 
ALOM program count: 1 value: -3.03 threshold: 0.0 

INTEGRAL Likelihood = -3.03 Transmembrane 161 - 177 ( 161 - 178) 
PERIPHERAL Likelihood = 3.55 117 
modified ALOM score: 1.11 

*** Reasoning Step: 3 

Final Results 

bacterial membrane --- Certainty=0 . 2211 (Affirmative) < suco 

bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

ORF0179K298 - 1278 of 1578) 

EGAD|l08208|BS0806(3 - 327 of 333) acetoin dehydrogenase El component {Bacillus subtilis} 
OMNI|NT01BS0951 acetoin:DCPIP oxidoreductase alpha subunit 

GP|2780395|dbj|BAA24296.l| |D78509 YfjK {Bacillus subtilis} 

GP|2633130|emb|CAB12635.l| | Z99108 acetoin dehydrogenase El component (TPP- dependent alpha 
subunit) {Bacillus subtilis} GP | 2957146 | gb|ARC05582 . 1 | |AF006075 TPP-dependent acetoin 
dehydrogenase, El alpha-subunit {Bacillus subtilis} PIR|D6958l|D69581 acetoin dehydrogenase 
El component (TPP-dependent alpha subuni) acoA - Ba 
%Match =26.3 

%Identity =45.3 %Similarity =65.7 

Matches = 148 Mismatches = 109 Conservative Sub.s = 67 

231 261 291 321 351 381 411 441 

F*IEMPFTKTKKAVQILASCEKNLYNN*VIKIFLEVRMWLSKEQHLDMFLKMQRIRDVDMKFNKLVRRGFVQGMTHFSV 
:]: ::|::|: I 1 = II II : I ::| =1 : I h 



llll =11 I I I I I Mill: Mil I: II 11= I III I 11 = 11111 = 1 = 1 = 11 I llllllh I 
GEEAVAVGVCAHLHDGDSITSTHRGHGHCIAKGCDLDGMMAEIFGKATGLCKGKGGSMHIADLDKGMLGANGIVGGGFTL 



711 741 771 801 831 861 891 921 

55 AVGAALTQQYEGTDNIVIAFSGDSATNEGSFHESWIA^VWLPVIFFIINJSKYGISTDITYSTKIPHLYMRADAYGIPG 
I 1=111 =1= I 1= = I II 1 1=1=111 =1111111111=1 II II =1 l== = II II =11 

AOSSALTAKYKQTKIWSVCFFGDG^ 

140 150 160 170 180 190 200 210 

60 951 981 1011 1041 1071 1098 1128 1158 

HYWDGITOLMAvYEKMHEVINYWSGNGPAIVE VESYRWFGHSTADAG 

ii imm i i i = i ii== = i =n =n ii i = ii = i i= = ii == ■■■■ n = = i 

-VTVDGKDILAVYQAAEEAIERARNGGGPSLIECMTYRNYGHFEGDAQTYKTKDERVEHLEEroAIQGFKNYLL 
220 230 240 250 260 270 280 

65 
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1188 1218 1248 1278 1308 1338 1368 1398 

EEELAAIEAQVIKEWEGVKFAEESPFPDMSVAFEDVFVD*NNIjK*MRFISFYYSID*K^ 

= h II =1 : =1= I hhlhl I = 11 = 1 
--KLSDIEQRVSESIEKAVSFSEDSPYPKDSELLTDVTVSYEK3GM 
5 300 310 320 330 

SEQ ID 8798 (GBS403) was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 171 (lane 2; MW 64.4kDa). It was also expressed in E.coli as a His-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 76 (lane 4; MW 39.5kDa). 

GBS403-GST was purified as shown in Figure 218, lane 6. 

10 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1394 

A DNA sequence (GBSxl479) was identified in S.agalactiae <SEQ ID 4277> which encodes the amino 
acid sequence <SEQ ID 4278>. This protein is predicted to be ABC transporter. Analysis of this protein 
15 sequence reveals the following: 

Possible site: 22 

>» Seems to have no N-terminal signal sequence 

Final Results 

20 bacterial cytoplasm --- Certainty=0 .2464 (Affirmative) < suco 

bacterial membrane — Certainty=0 . 0000 (Not Clear) < suco 
bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9775> which encodes amino acid sequence <SEQ ID 9776> 
25 was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB12414 GB:Z99107 similar to ABC transporter (ATP-binding 
protein) [Bacillus subtilis] 
Identities = 328/643 (51%), Positives = 443/643 .(68%), Gaps = 9/643 (1%) 

30 

Query: 
Sbjct: 

35 i Query: 69 NKKRDLSLSYLAQDSRFQSENTIFQEMLQVFDSLREVEKRLRELELQMGQVSGSDLEQLM 128 
K +D+++ YLAQ + S+ TI +E+L VFD L+ +EK +R +E +M +LE +M 

Sbjct: 61 IKPKDITMGYIAQHTGLDSKLTIKEELLTVFDHLI<AMEI<EMRAMEEKMAAADPGELESIM 120 

Query: 129 KTYDILSEEFREKGGFTYESDIKAILjNGFKFNSDrilWEMPISELSGGQNTRLAlAKMLLEK 188 
40 KTYD L +EF++KGG+ YE+D++++L+G F+ + LSGGQ TRLAL K+LL + 

Sbjct: 121 KTYDRLQQEFKDKGGYQYEADVRSVLHGLGFSHFDDSTQVQSIjSGGQKTRLALGKLLLTQ 180 

Query: 189 PELLVLDEPT1OTLDIDTIAWLENYLVOTC<3AIjIIVSHDRYF1,DKVATVTYDLTTHSLDRY 248 
P+LB+LDEPTNHLDIDT+ WLE+YL Y GA++IVSHDRYFLDKV Y+++ +Y 
45 Sbjct: 181 PDLLILDEPTIffiLDIDTLTWLEHYLQGYSGAILIVSHDRYFLDKVVNQVYEVSRAESKKY 240 

Query: 249 VGNYSKFMDLKAEKIATEEKNFEKQQKEIAKLEDFVQRNIVRASTTKRAQARRKQLEKME 308 

GNYS ++D KA + + K +EKQQ EIAKL+DFV RN+ RASTTKRAQ+RRKQLE+M+ 
Sbjct: 241 HGNYSAYLDQKAAQYEKDLmYEKQQDEIAKLQDFVDRNIARASTTKRAQSRRKQLERMD 300 

50 

Query: 309 RLDKPNVEQKSANMTFBIAGKV'SGI^VLTLENAAIGYEG-VSLSEPIDLDVKKFDAIAIVG 367 

+ KP +4KSAN F K SGN VL +++ I YE L + + + ++ A+VG 
Sbjct: 301 VMSKPLGDEKSANFHFDITKQSGNEVLRVQDLTISYENQPPLLTEVSFMLTRGESAALVG 360 

55 Query: 
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Sbjct: 361 PNGIGKSTLLKTLIDTLKPDQGTISYGSNVSVGYmQEQAELTSSKRVLDELWDEYPGLP 420 

Query: 428 EVEIRNRLGAFLFSGDDVKKSVSMLSGGERARLLIAKLSMENNNFLILDEPTNHLDIDSK 487 

E EIR LG FLFSGDDV K V LSGGE+ARL LAKL ++ NFLILDEPTNHLD+DSK 
Sbjct: 421 EKEIRTCLGNFLFSGDDVLKPVHSLSGGEKARIJUAKLMLQKANFLILDEPTNHLDLDSK 480 

Query: 488 EVLENALIEFDGTLLE^SHDRYFimVATFCVLEISDKGSTLYLGDYDYYLTKKAELEELA 547 

EVLENALI++ GTLLFVSHDRYFINR+AT+VLE+S YLGDYDYY KK E EL 

Sbjct: 481 EVLENALIDYPGTLLFVSHDRYF1NRIATRVLELSSSHIEEYLGDYDYYTEKKTEQLELE 540 

Query: 548 RLNEEEVSASKTEIDVTSD YETQKANQKEFRKITRRWEIEARLEVLENDENNING 603 

++N++E KT V SD YE +K +K+ R+ RR+ EIE ++ +E + + + 
Sbjct: 541 KMNQQE - ETDKTPAWKSDSKRS YEEEKEWKKKERQRLRRIEE I ETTVQTI EENI SRNDE 599 

Query: 604 LMLET HDIGKLSDLQKELESIQEEQLLLKEEWENLNMRLD 643 

L+ + D K+ + + E + +E L+ EWE L+ D 

Sbjct: 600 LLCDPEVYQDHEKVQAIHADNEKLNQELESLLSEWEELSTEED 642 

A related DNA sequence was identified in S.pyogenes <SEQ ID 4279> which encodes the amino acid 
sequence <SEQ ID 4280>. Analysis of this protein sequence reveals the following: 

N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .2042 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities, = 473/635 (74%), Positives = 545/635 (85%), Gaps = 1/635 (0%) 

Query: 9 Mil LQGNKI ERS FSGDVLFDNINIQVDQFXIRIALVGRNGAGKSTLLKILVGEEAPTKGE I 68 

MIILQGNK+ERSFSGDVLF NI+4-QVD+RDRIA1.VG NGAGKSTLLK+bVGEE PT GE+ 
Sbjct: 1 MIILQGNKLERSFSGDVLFQNISLQVDERDRIALVGPNGAGKSTLLKLLVGEETPTSGEV 60 

Query: 69 NKKRDLSLSYLAQDSRFQSENTIFQEMLQVFDSLREVEKRLRELELQMGQVSGSDLEQLM 128 

N K+DL+LSYLAQ+SRF+S+ TT++EML+VF+-LR+ EKRLR++E+ M VSG L +LM 
Sbjct: 61 NTKKDLTLSYLAQNSRFESDQTIYEEMLKVFEALRQDEKRLRQMEMDMATVSGQVLTRLM 120 

Query: 129 KT YD I LSEE FREKGGFTYESD I KAI LNGFKFNSDMWEMPI SELSGGQNTRLALAKMLLEK 188 

YD+L+E FR++GGFTYESDIKAILNGFKF4- MW+M I+ELSGGQNTRLALAKMLLEK 
Sbj ct : 121 TDYDLLTEHFRQQGGFTYESDIKAIMGFKFDESNWQMTIAELSGGQNTRIALAKMLLEK 180 

Query: 189 PELLVLDEPTNHLDIDTIAWLENYLVNYQGALIIVSHDRYFLDKVATVTYDLTTHSLDRY 248 

PELLVLDEPTNHLDI+TIAWLENYL NYQGALI IVSHDRYFLDKVATVT DLT + LDRY 
Sbjct: 181 PELLVLDEPTiraLDIETIAWLE^riA^QGALIIVSHDRYFLDKmTVTLDLTPNGLDRY 240 

Query: 249 VGNYSKFMDLKAEKIATEEKNFEKQQKEIAKLEDFVQRNIVRASTTKRAQARRKQLEKME 308 

GNYS+FM LKAEK+ EEK F+KQQKEIAICLEDFVQ+NIVRASTTKRAQARRKQLEK+E 
Sbjct: 241 SGNYSRFRAIiKAEKLVAEEKQFDKQQKEIAKLEDFVQKNIVRASTTKRAQARRKQLEKIE 300 

Query: 309 RLDKPWEQKSAmTFHAGKVSGWVLTLENAAIGYEGVSLSEPIDLDVKKFDAIAIVGP 368 

RLDKP +ECSA+MTFHA K SGNWL +E AAIGY LSEPI++D+ K DAIA+VGP 
Sbjct: 301 RLDKPTGGRKSAHMTFHAEKPSGNVVLRVEEAAIGYGDQVLSEPINVDINKLDAIAWGP 360 

Query: 369 NGIGKSTLIKSLVGQ1PFIKGEAKLGANVBTGYYDQSQSNLTKTNTVLDELWDAFSTTPE 428 

NGIGKSTLIKS++GQ+P +KG+ K GANVETGYYDQ+QS+LT +NTVL+ELW FSTTPE 
Sbjct: 361 NGIGKSTLIKSIIGQLPLLKGQLKYGAMVETGYYDQTQSHLTSSNTVLEELWQDFSTTPE 420 

Query: 429 vEIRNRLGAFLFSGDDVKKSVSMLSGGERARLLLAKLSMFjraNFLILDEPTNHLDIDSKE 488 

V+IRl^LGAFLFSGDDVKKSV+mjSGGE+ARLLLAKlSMFJ^FL+LDEPTNHLDIDSKE 
Sbjct: 421 VDIRNRLGAFLFSGDDVKKSVAI^SGGEKARLLLAKLSME^ 480 



Query: 489 VTjENALIEFDGTLLWSHDRYFINRVATKVLEISDKGSTLYLGDYDYYLTKKAELEELAR 549 
65 VLENALI +FDGTLLFVSHDRYFINR+ATKVT1EI++ GSTLYLGDYDYYL KKAELEELAR 
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Sbjct: 481 VLENALIDFDGTLLFVSHDRYFimiATKVLEITEHGSTLYLGDYDYYLEKKSELEELAR 540 

Query: 549 Ii^EEVSASKTEIDOTSDYETQKANQKEFRKITI^VVEIEARLEVLENDENNINGLMLET 608 

L E E T DY+ QKANQKE R++TRR EIEARLE +E I M + 

Sbjct: 541 LAAGETVEETKEASAT-DYQLQKftNQECERRRLTRRYEEIEARLETIEERIGAIQEDMHAS 599 

Query: 609 NDIGKLSDLQKELESIQEEQLLLMEEWENLNMRLD 643 

ND +L QKE + + +EQ LMEEWE + +++ 
Sbjct: 600 NDTAQLIAWQKEWDQLDQEQEALMEEWETIAEQIE 634 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1395 

A DNA sequence (GBSxl480) was identified in S.agalactiae <SEQ ID 4281> which encodes the amino 
acid sequence <SEQ ID 4282>. This protein is predicted to be thiophene degradation protein F (thdF). 
Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 

Final Results 

bacterial cytoplasm --- Certainty=0 . 0876 (Affirmative) < suco 

bacterial membrane --- Certainty=0 . 0000 (Mot Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9773> which encodes amino acid sequence <SEQ ID 9774> 
was also identified. 

A related DNA sequence was identified in S.pyogenes <SEQ ID 4283> which encodes the amino acid 
sequence <SEQ ID 4284>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 0795 (Affirmative) < suco 

bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 384/458 (83%) , Positives = 427/458 (92%) 

Query: 12 MSITKEFDTIJAAlSTPLGEGAIGIVRISGTDALKIASKIYRGKDriSAIQSHTLNYGHIVD 71 

MSITKEFDTI AISTPLGEGAIGIVR+SGTDAL IA +++GK+L + SHT+NYGHI++ 
Sbjct: 1 MSITKEFDTITAISTPIX3EGAIGIWLSGTDALAIAQSVFKGKNLEQVASHTINYGHIIN 60 

Query: 72 PDKiraiLDEvMLGvMIAPKTFTREDVIEIOT 131 

P I+DEVM+ VMLAPKTFTRE + V+ E INTHGGI AVTNE I LQL+ +R GARMAEPGEFTK 
Sbjct: 61 PKTGTIIDEWVSVMIAPKTFTREF^INTHGGIAVTNEILQLLIRQGARMAEPGEFTK 120 

Query: 132 PAFLNGRVDLTQAEAVMDLIRAKTDKAMDIAVKQDDGSLKTLINNTRQEILNTLAQVEVN 191 

RAFLNGRVDLTQAEAVMD+IRAKTDKAM IAVKQLDGSL LIN+TRQEILNTLAQVEVN 
Sbjct: 121 RAFIiNGRVDLTQAEAVMDIIRAKTDKAr'ITIAVKQLDGSLSQLINDTRQEIIiNTIiAQVEVN 180 

Query: 192 IDYPEYDDVEEMTTTLMREKTQEFQALMENLLRTARRGKI LREGLSTAI IGRPNVGKBSL 251 

IDYPEYDDVEEMTT L+REKTQEFQ+L+E+LLRTA+RGKI LREGLSTAI IGRPNVGKSSL 
Sbjct: 181 IDYPEYDDVEEMTTALLREKTQEFQSLLESLLRTAKRGKILREGLSTAIIGRPNVGKSSL 240 

Query: 252 LNNLLREEKAIVTDIEGTTRDVIEEYVNIKGVPLKLVDTAGIRDTDDIVEKIGVERSKKA 311 

LNNLLRE+KAI VTD I GTTRDVIEEY^/NIKGVPLKLVDTAGIR+TDD+VE+IGVERSKKA 
Sbjct: 241 LNNLLREDKAIVTDIAGTTRDVIEEYWIKrOTLKLVDTAGIRETDDLVEQIGVERSKKA 300 
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Query: 312 LEEMJLVLLVLNSSEPLTLQDRSLLELSKESNRIVLLNKTDLPQKIEVNEIiPKNVIPISV 371 

L+EADLVLLVLN+SE LT QDR4-LL LS++SNRI+LLNKTDL QKIE+ +LP + IPISV 
Sbjct: 301 LQFADLVLLVLNASEKLTDQDRALLNLSQDSNRI I LLNKTDLEQKIELEQLPDDYI PI SV 360 

Query: 372 LENENIDKIEERINDIFFDNAGMVEHDATYLSNiU^ISLIEKAVDSIjKAVNEGLELGMPV 431 

L N+NI + IE+RIN +FFDNAG+VE DATYLSNARHISLIEKAV SL+AVN+GL LGMPV 
Sbjct: 361 LTNQNINLIEDRINQLFFDNAGLVEQDATYLSNARHISLIEKAVQSLEAVNDGLALGMPV 420 

Query: 432 DLLQVDMTRTWEILGEITGDAAPDELITQLFSQFCLGK 469 

DLLQVD+TRTWEILGE I TGDAAPDELI TQLFSQFCLGK 
Sbjct: 421 DLLQVDLTRTWEILGEITGDAAPDELITQLFSQFCLGK 458 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1396 

A DNA sequence (GBSxl481) was identified in S.agalactiae <SEQ ID 4285> which encodes the amino 
acid sequence <SEQ ID 4286>. Analysis of this protein sequence reveals the following: 

Possible site: 41 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -9.13 Transmembrane 280 - 295 ( 276 - 299) 
INTEGRAL Likelihood = -4.83 Transmembrane 249 - 2S5 ( 243 - 266) 

Final Results 

bacterial membrane Certainty=0 .4673 (Affirmative) < suco 

bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm — Certalnty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAD40365 GB:AF036485 hypothetical protein [Plasmid pNZ4000] 
Identities = 88/306 (28%) , Positives = 149/306 (47%) , Gaps = 17/306 (5%) 

Query: 1 MIVEQKFGNGFTWIN IEAEQLRTETSEIQAKY-LDSEIITYALDDYERAFMECSHIK 56 

MI +K NG WI I AE+ T ++ +Y +D +11 Y D+ E I 
Sbjct: 1 MIKPEKTINGTKWIETIQINAEERAT LEDQYGIDEDI IEYVTDNDESTNYVYD- IN 55 

Query: 57 GKEVLTIIFNTIDLKQKESYYETVPMTFCLSHDRLITVTRSRNSYMLELLQKYLDRNPDV 116 

+ LI L+ YTP L LT+S +LLD NP+V 

Sbjct: 56 EDDQLFIFLAPYALDKDALRYITQPFGMLLHKGVLFTFNQSGIPEVNTALYSALD-NPEV 114 

Query: 117 -SPK^FLFAALTLITKQYFNWSKIDREKDILNRQLREQTTNKRLLAMSDLETGSVYLLT 175 

S F+ L + + + I ++++ L++ L +T N L+++S L+ +L + 
Sbjct: 115 KSVDAFILETLFTVWSFIPISRAITKKRNYLDKKLNRKTKNSDLVSLSYLQQTLTFLSS 174 

Query: 176 AANQNALVLEQLDVHPSQRFNSEVBKEQLS DALIEAHQLVSMTQLNSQVLSQLSSTF 232 

A N L +LD P F +++++ D IE Q+ M ++ +QV+ ++ T 
Sbjct: 175 AVQTN LSELDRLPKTHFGVGADQDKIDLFEDVQIEGEQVQRMFEIETQWDRIDHTL 231 

Query: 233 NNVLNNNUCTLTGLNIISINIAIIAAITGFFGKNIPLPLTESRSSWLIVIATSVLLWVI 292 

N++ NNNLN+ + L I S+ +A+ I+GF+GMN+ LPL + +W++ + SV+L V 
Sbjct: 232 NSLANNNIiNDTMKFLTIWSLTMAVPTIISGFYGMNVKLPIAGMQYAWMLTLGISWLIVA 291 

Query: 293 IAQILK 298 

+ +LK 
Sbjct: 292 MLIMLK 297 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
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Example 1397 

A DNA sequence (GBSxl482) was identified in S.agalactiae <SEQ ID 4287> which encodes the amino 
acid sequence <SEQ ID 4288>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 1437 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1398 

A DNA sequence (GBSxl483) was identified in S.agalactiae <SEQ ID 4289> which encodes the amino 
acid sequence <SEQ ID 4290>. This protein is predicted to be exonuclease RexA. Analysis of this protein 
sequence reveals the following: 



Final Results 

bacterial cytoplasm — Certainty=0. 3165 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside --- Certaxnty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9771> which encodes amino acid sequence <SEQ ID 9772> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

= 73/1211 (6%) 

KHTPEQIEAIYTFGNIWLVSASAGSGKTFVIWERILDKLLRGVPIDSLFISTFTVKAAGE 87 
K TPEQ EAI++ G N+LVSASAGSGKTFVM +-RI++K+ +G+ ID LFISTFT KAA E 
KLTPEQNEAIHSSGKNILVSASAGSGKTFVMAORIVEKVKQGIEIDRLFISTFTKKAASE, 64 

LKERLEKKINESLKSAESDDLKQFLTQQLVGIQTADIGTMDAFTQKIVNQYGYTLGISPI 147 
L+ RLE+ + ++ + + D+ LT L +■ ADIGTMD+FTQK-t- + I P 

LRMRLERDLKKARQESSDDEEAHRLTLALQNLSNADIGTMDSFTQKLTKANFNRVNIDPN 124 





28 


Sb j Ct : 


5 




88 


Sbjct: 








Sbjct: 


125 




203 


Sbjct: 


184 


Query: 


261 


Sbjct: 


241 


Query: 


320 



F KL+KNFS +R 4 



Y +Y F+ +T+NP W++ FLKG +TY +++ D 



^ V S+D L 
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Sbjct: 


283 


Query: 


380 


Sbjct: 


339 


Query: 


440 


Sbjct: 


399 


Query: 


500 


Sbjct: 


459 


Query: 


556 


Sbjct: 


516 


Query: 


616 


Sbjct: 


575 


Query: 


675 


Sbjct: 


635 


Query: 


735 


Sbjct: 


695 


Query: 


795 


Sbjct: 


755 


Query: 


854 


Sb j ct : 


815 


Query: 


910 . 


Sbjct: 


875 




969 


Sb j ct : 


927 




1023 


Sbjct: 


987 



AFEFSDIAHFAIQI LEENHD IRQLYQDKYHEVMVDEYQDNNHTQERMLELLSNGHNRFMV 439 
AFE+SDIAHFAI + ILEEN DIR+ ++ Y E+M+DEYQD +HTQERMLELLSNGHN FMV 
AFEYSDIAHFAIEILEENPDIRENLREHYDEIMIDEYQDTSHTQERMLELLSNGHNLFMV 3 98 



+T E4+I+DS+ -t 



++ L+WKIY + +Y+DYVGAL E R 



GLEF YVF++N+ +F+ D+ +ILSR G+G+KY+AD++ E 4 



riliATLSEQMRLLYVAMTRAEKKLYLVGKASQT KWADHYDLVS - ENNH 909 

4- A LSE+MR+LYVA TRA+KKLYLVGK T + YD + E 



LPLASRETFVTFQDWLLAVHETYKKQELFYDINFVSLEELTDHHIGMVNPSLPFNPDNK- 968 

+ FQ W+LA+ K L +N + +EL + + PD K 

LSDKFRNSSRGFQHWILALQNATK- - - LPMKLNVYTKDELETEKLEFTS QPDFKK 926 



++DF DF KK A +GSA H MQ + S + Q L E+ 
fRVKNLDFS--DFG-PKKITAAEMGSATHSFMQYADF-SQADLFSFQATLDEMG 1042 

Query: 1082 AETSVKAAIQIEKINYFFQETSLGKYIQEEVEHLHREAPFAMLKEDPESGEKFWRGIID 1141' 

+ +K I I KI F +T G+++ E V+ +EAPF+ML+ D + E+++VRGI D 
Sbjct: 1043 FDEKIKNQIDITKILTLF-DTEFGQFLSENVDKTVKEAPFSMLRTDEFAKEQYIVRGICD 1101 

Query: 1142 GYLLLENRIILFDYKTDKFVNP LELKERYQGQMALYAEALKKSYEIEKIDKYLILLG 1198 

G++ L ++IILFDYKTD+F N E+KERY+ QM LY+EAL+K+Y + +IDKYLILLG 

Sbjct: 1102 GFVKIADKIILFDYKTDRFTNVSAISEIKERYKDQNINLYSEALQKAYHV1JQIDKYLILLG 1161 

Query: 1199 G-KQLEWKMD 1208 

G +++ V K+D 
Sbjct: 1162 GPRKVFVEKID 1172 

A related DNA sequence was identified in S.pyogenes <SEQ ID 429 1> which encodes the amino acid 
sequence <SEQ ID 4292>. Analysis of this protein sequence reveals the following: 

Poesible site: 61 
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»> Seems to have an uncleavable N-term signal seq 

Final Results 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:AAC12966 GB:U76424 exonuclease RexA [Lactococcus lactis] 
Identities = 478/1206 (39%), Positives = 700/1206 (57%), Gaps = 65/1206 (5%) 

Query: 40 KRTAQQIEAIYTSGQNILVSASAGSGKTFVMVERILDKILRGVSIDRLFISTFTVKAATE 99 

K T +Q EAI++SG+NILVSASAGSGKTFVM +RI++K+ +G+ IDRLFISTFT KAA+E 
Sbjct: 5 KLTPKQNEAIHSSGKNILVSASAG3GKTFVMAQRIVEKVKQGIEIDRLFISTFTKKAASE 64 

Query: 100 LRERIENKLYSQIAQTTDFQMKVYLTEQLQSLCQADIGTMDAFAQKWSRYGVSIGISSQ 159 

LR R+E L +++D + LT LQ+L ADIGTMD+F QK+ + I 

Sbjct: 65 LRI^LERDLKKARQESSDDEEAHRLTIALQNLSNADIGTMDSFTQKliTKANFNRVNIDPN 124 

Query: 160 FRIMQDKAEQDVLKQEVFSKLFNEFMNQKEA PVFRALVKNFSGNCKDTSAFRELV 214 

FRI4 D+ E D+++QEVF +L +++ E+ F L+KNFS + ++ F+++V 

Sbjct: 125 FRILADQTESDLIRQEVFEQLVESYLSADESLNISKDKFEKLIKNFSKD-RNILGFQKW 183 

Query: 215 YTCYSFSQSTENPKIWLQENFLSAAKTYQRLEDIPDHDIELLLLAMQDTANQLRDVTDME 274 

YT Y F+ +TENP WL4 FL +TY+ LD+ + D+ 4 T+L+ + 

Sbjct: 184 YTIYRFASATENPISWLENQFLKGFETYKSLTDLSE-DFTVNVKENLLTFFELLEAISKK 242 

Query: 275 DYGQLTKAG-SRSAKYTKHLTIIEKLSDWWDFKCLYGKAGLDRLIRDVTGLIPSGNDVT 333 

D+ T S + E LS +DF D+ 
Sbjct: 243 DFVTCTALFLSIDTDIRVGSSKDEALSALKKDFSA QKQDLV 283 



Sbjct: 284 GSKSKPGELRKFVDKIK---HGQLIEKYQNQAFEIASDLQKFIIDFYKTYLERKKNENAF 340 

Query: 394 EFSDIAHFAIKILEENTDIRQSYQQHYHEVMVDEYQDNNHMQERLLTLLSNGHNRFMVGD 453 

• E+SDIAHFAH-ILEEN D1R++ ++HY E+M+DEYQD +H QER+L LLSNGHN FMVGD 

Sbjct: 341 EYSDIAHFAIEILEENPDIRENLREHYDEIMIDEYQDTSHTQERMLELLSNGHNLFMVGD 400 

Query: 454 IKQSIYRFRCADPQIFNQKFRDYQKKPEQGICVILLKENFRSQSEV1NVSNAVFSHLMDES 513 

Sbjct: 401 

Query: 514 VGDVLYDEQHQLIAG- -SHAQTVPYLDRRAQLL^YNSDKDDGNAPSDSEGISFSEVTIVA 571 

4G++ Y ++ L+ G S D +LLDY + + IS E+ A 

Sbjct: 461 LGEMTYGPCEEALVQGNISDYPVEAEKDFYPELLLYKENTSEEEIEDSEVKISDGEIKGAA 520 



+EI KL + GV +DI +LV S++ N+ I Y IP+ D G+ ++LKS+EV++ML 

Sbjct: 521 QEIKKL-IEYGVEPKDIAILVRSKSNNNKIEDILLSYDIPWLDEGRVDFLKSMEVLIML 579 

Query: 632 OTLRTINNPRNDYALVALLRSPMFAFDEDDIiARIALQKDNELDKDCLYDKIQRAVIGRGA 691 

D LR I+NP D +LVA+LRSP+F F+ED+L RI++Q +L +DKI ++ G 

Sbjct: 580 DVLRAIDNPLYDLSLVAMLRSPLFGFNEDELTRISVQGSRDLR---FWDKILLSLKKEGK 636 

Query: 692 HPELIHDTLLGKLNVFLKTLKSWRRYAKLGSLYDLIWKIFNDRFYFDFVASQAKAEQAQA 751 

+PELI+ +L KL F + WR+ ++ L+WKI+ + +YFD+V + E QA 

Sbjct: 637 NPELINLSLEQKLKAFNQKFTEWRKLVNKIPIHRLLWKIYTETYYFDYVGALKNGEMRQA 696 

Query: 752 NLYALALRANQFEKSGYKGLYRFIKMIDKVLETQNDLADVEVATPKQAVNLMTIHKSKGL 811 

NL AL++RA +E SGYKGL++F+++I+K +E NDLA V + P+ AV +MT HKSKGL 
Sbjct: 697 NLQALSVRAESYESSGYKGLFKFVRLINKFMEQNNDLASVNIKLPQNAVRVMTFHKSKGL 756 

Query: 812 QFPYVFILNCDKRFSMTDIHKSFILNRQHGIGIKYLADIKGLLGE-TTLNSVKVSMETLP 870 

+F YVF++N RF+ D4- + IL+R+HG+G+KY+AD+K T V MET P 

Sbjct: 757 EFDYVFLMNLQSRFNDRDLKEDVILSREHGLGMKYIADLKAEPDVITDFPYALVKMETFP 816 
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Query: 871 YQMKQELRIATLSEEMRLLYVAMTRAEKKVYFIGK---ASKSKSQEITDPKKL-GKLLP 926 

Y +NK + A LSEEMR+LYVA TRA+KK+Y +GK K E+ D L GK+L 

Sbjct: 817 YMVNKDLKQRAALSEEMRVLYVAFTRAKKKLYLVGKIKDTDKKAGLELYDAATLEGKILS 876 

Query: 927 IALREQLLTFQDWLLAIADIFSTEDLYFDVRFIEDSDLTQESVGRLQTP---QLLNPDDL 983 

R FQ W+LA+ + L + +L E + P +L+ + 

Sbjct: 877 DKFRNSSRGFQHWIIMQ---NR.TKLPMKI1NVYTKDELETEKLEFTSQPDFKKLVEESEK 933 

Query: 984 KDNRQSETIARALDMLEAVSQENANY--EAAIHLPTVRTPSQL-KATYEPLLEPIGVDII 1040 

DN S + ++ EA +N Y +AA L +++TPSQ+ K +YE L+ V + 

Sbjct: 934 FDNIMSFSD EIKEAQKIMNYQYPHQAATELSSIQTPSQVKKRSYEKQLQVGEVQPV 989 

Query: 1041 EKSSRSLSDFTLPHFSKKAKVEASHIGSALHQLMQVLPLSKP--INQQTLLDALRGIDSN 1098 

+ R + + F K K+ A+ +GSA H MQ S+ + Q LD + G D 

Sbjct: 990 SEFVR-VKNLDFSDFGPK-KITAAEMGSATHSFKQYADFSQADLFSFQATLDEM-GFD-- 1044 

Query: 1099 EEVKTALDLKKIESFFCDTSLGQFFQTYQKHLYREAPFAILKLDPISQEEYVLRGIIDAY 1158 

E++K +D+ KI + F DT GQF +EAPF+4-L+ D ++E+Y+-+RGI D + 

Sbjct: 1045 EKIKNQIDITKILTLF-DTEFGQFLSENVDKTVKEAPFSMLRTDEFAKEQYIVRGICDGF 1103 

Query: 1159 FLFDDHIVLVDYKTDKYKQP---IEI.KKRYQQQLELYAEALTQTYKLPVTKRYLVLMGGG 1215 

D I+L DYKTD++ E+K+RY+ Q+ LY+EAL + Y + +YL+L+GG 

Sbjct: 1104 VKIiADKIlLFDYKTDRFTNVSAISEIKERYKDQMNLYSEALQKAYHVNQIDKYLILLGGP 1163 

Query: 1216 KPEIVE 1221 

Sbjct: 1164 RKVFVE 1169 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 728/1211 (60%), Positives = 916/1211 (75%), Gaps = 5/1211 (0%) 



Query: 


1 


Sbjct: 


13 


Query: 


61 


Sbjct: 


73 




121 


Sbjct: 


133 




181 


Sbjct: 


193 




241 


Sbjct: 


253 


Query: 


301 


Sbjct: 


313 






Sbjct: 


373 




421 


Sbjct: 


433 


Query: 


481 


Sbjct: 


493 



RILDKLLRGVPIDSLFISTFTVKAAGELKERLEKKINESLKSAESDDLKQFLTQQLVGIQ 120 
RILDK+LRGV ID LFISTFTVKAA EL+ER+E K+ + +K +LT+QL + 

RILDKILRGVSIDRLFISTFTVKAATELRERIENKLYSQIAQTTDFQMKVYLTEQLQSLC 132 



ADIGTMDAF QK+V++YGY++GIS FRI+QDK EQDV+K EV++ LF+++M K A 



F LVKNFSGK KD+ AFRE+VY Y+FSQST+NPK W+Q FL A+TY E IPD 4 



4- LL MQ TANQLRD+TD EDY QLT G +A Y KHL HE L W +DF LYGK 



j RD+T +IPSGNDVTV+ VKYP+FK LH +HLE I YQ + 



H QER+Ii LLSNGHNRFMVGDIKQSIYRFRQADPQIFN K++ YQ P QGK+I+LKENF 



RSQSEVL+ +N+VF+HLMDE VGD+LYDE HQL AGS Q 
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Query: 541 IDDSDSQQYDISPAEAICLVAKEIIRLHKEEIMVPFQDITLiLVSSRTRNDGILQTFDRYGIP 500 

++ S IS +E +VAKEII+LH ++ VPF+DITLLVSSRTRND I TF++YGIP 
Sbjct: 553 -GNAPSDSEGISFSEVTIVAKEIIKLHNDKGVPFEDITLLVSSRTRNDIISHTFNQYGIP 611 

Query: 601 LOTDGGEQNYLKSVEVIWMLDTIJWIDNPIiNDYALVALDRSPMFGFNEDDLTRIAIQD-- 658 

+ TDGG+QHYLKSVEVMVMLDTLR+I+NP NDYALVALLRSPMF F+EDDL RIA+Q 
Sbjct: 612 IATDGGQQNYLKSVEVIWMLDTLRTIiraPRNDYALVALIiRSPMFAFDEDDLARIALQKDN 671 

Query: 659 --VKMAFYHKVKLSYHKEGHHSDLITPELSSKIDHFMKTFQTWRDFAKWHSLYDLIWKIY 716 

K Y K++ + G H +L1 L K++ F+KT ++WR +AK SLYDLIWKI + 
Sbjct: 672 ELDKDCLYDKIQRAVIGRGAHPELIHDTLLGKLl>r/?LKTLKSl'IRRYAKLGSLYDLlWKIF 731 

Query: 717 NDRFYYDYVGALPKAEQRQANLYALALRANQFEKTGFKGLSRFIRMIDK^ENENDLADV 776 

NDRFY+D+V + KAEQ QANLYALALRANQFEK+G+KGL RFI+MIDKVLE +NDIADV 
Sbjct: 732 OTDRFYFDWASQAKAEQAQANLYALALRANQFEKSGYKGLYRFIKMIDKVLETQNDLADV 791 

Query: 777 EVALPQNAVNLMTIHKSKGLEFKYVFIENIDKKFSMVDITSPIiILSRNQGIGIKYVADMR 836 

EVA P+ AVNLMTIHKSKGL+F YVFILN DK+FSM DI IL+R GIGIKY+AD++ 
Sbjct: 792 EVATPKQAVNLMTIHKSKGLQFPYVFILNCDKRFSMTDIHKSFIIiNRQHGIGIKYLADIK 851 

Query: 837 HELEEEILPAVKVSMETLPYQLNKRELRLATLSEQMRLLYVAMTRAEKKLYLVGKASQTK 896 ■ 

L E L +VKVSMETLPYQLNK+ELRLATLSS+MRLLYVAMTRAEKK+Y 4GKAS++K 
Sbjct: 852 GLLGETTLNSVKVSmTLPYQLNKQELRIATLSESMRLLYVAMTRAEKKVYFIGKASKSK 911 

Query: 897 WADHYDLVSENNHLPLASRETFVTFQDVJLLATOSTYKKQELFYDIKFVSLEELTDHHIGM 956 

+ D LPLA RE +TFQDWLLA+ + + ++L++D+ F+ +LT +G 

Sbjct: 912 SQE1TDPKKLGKLLPLALREQLLTFQDWLLAIADIFSTEDLYFDVRFIEDSDLTQESVGR 971 

Query: 957 VNPSLPFNPDNKVENRQSEDIVRAISVLESVEQINQTYKAAIELPTVRTPSQVKKIYEPI 1016 

+ NPD+ +NRQSE I RA+ +LE+V Q+N Y+AAI LPTVRTPSQ+K YEP+ 

Sbjct: 972 LQTPQLI^PDDLKDNRQSETIARALDMLEAVSQIiN.TOYEAAIHLPrVRTPSQLKATYEPL 1031 



Query: 1017 I 

L+ GVD++E +++ DF LP FS K + + +GSA+H+LMQ + +S + + + A 
Sbjct: 1032 LEPIGVDIIEKSSRSLSDFTLPHFSKKAKViASHIGSALHQLMQVLPLSKPINQQTIjLnA 1091 ' 

Query: 1077 LTEVWAETSVKAAIQIEKINYFFQETSLGKYIQEEVEHLHREAPFAMLKEDPESGEKFW 1136 

L +++ VK A+ ++KI FF +TSLG++ Q +HL+REAPFA+LK DP S E++V+ 
Sbjct: 1092 LRGIDSNEEVKTALDLKKIESFFCDTSLGQFFQTYQKHLYREAPFAILKLDPISQEEYVL 1151 

Query: 1137 RGIIDGYLLLENRIILFDYKTDKFVNPLELKERYQGQMALYAEALKKSYEIEKIDKYLIL 1196 

RGIID Y L ++ I+I, DYKTDK+ P+ELK+RYQ Q+ LYAEAL ++Y++ +YL+L 
Sbjct: 1152 RGIIDAYFLFDDHIVLVDYKTDKYKQPIELKKRYQQQLELYAEALTQTYKLPVTKRYLVL 1211 

Query: 1197 LGGKQLEWKM 1207 

+GG + E+V++ 
Sbjct: 1212 MGGGKPE I VEV 1222 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1399 

A DNA sequence (GBSxl484) was identified in S.agalactiae <SEQ ID 4293> which encodes the amino 
acid sequence <SEQ ID 4294>. This protein is predicted to be exonuclease RexB. Analysis of this protein 
sequence reveals the following: 

d N-terminal signal sequence 

- Final Results 

bacterial cytoplasm Certainty=0 .0660 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



WO 02/34771 



PCT/GB01/04789 



The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC12965 GB:U76424 exonuclease RexB [Lactococcus laotis] 
Identities = 363/1093 (33%), Positives = 604/1093 (55%), Gaps = 67/1093 (6%) 

Query: 1 MKLLYTDINHDMTEILWQAAHAAEAGWRIFYIAPNSLSFEKERAVLENLPQ- - -EASFA 57 

M++LYT+I D+TE L+ A E +++YI P+S+SFEKB+ +LE h + A F 
Sbjct: 1 MEILYTEITQDLTEGLLEIALEELEKNRKVYYIVPSSMSFEKEKEILERLAKGSDTAVFD 60 

Query: 58 ITITRFAQLARYFTIMQP-NQKESITOIGLAMIFYRALASFEDGQLKVFGRLKQDASFIS 116 

+ +TRF QL YF + K L +GL+M+F R L SF+ ++ ++ L+ A F+ 
Sbjct: 61 LLVTRFKQLPYYFDKREKATMKTELGTVGLSMLFRRVLRSFKKDEIPLYFSLQDSAGFLE 120 

Query: 117 . QLVDLYKELQTANLSILELKYLHSPEKFEDLLAIFLWSDLLREGEYDNQSKIAFFTEQV 176 

L+ L EL TANLS+ L ++ + +LA F + EY N S+ FT ++ 
Sbjct: 121 MLIQLRAELLTANLSVENLPDNPKNQELXKILAKFEAELSV EYANYSEFGDFTNRL 176 

Query: 177 RSGQLDVDLKNTILIVDGFTRFSAEEEALIKSLSSRCQEIIIGAYASQKAYKANFTNGNI 236 

G+ D LK+ +I+DG+TRFSAEEE I+S+ + ++G Y+ + + A +1 
Sbjct: 177 VDGEFDQQLKDVT 1 1 IDGYTRFSAEEELF I ES I QEKVARFWGTYSDENSLTAG - - SET I 234 

Query: 237 YSAGVDFLRYLATTFQTKPEFILSKWESKSGFEMISK NIEGKHDFTNSSHILDDT 291 

Y + T F+ K L K S + E+ SK +++ + T+ L 

Sbjct: 235 YVGTSQMI TRFRNKFPVELRKIASSAVNEVYSKLTRILDLDSRFVITDEKIELKAE 290 

Query: 292 AKDCITIVffiCIHQKDEVEHVARAIRQKLYQGYRYKDILVLLGD\7DSYKLQLSKIFEQYDI 351 

+ IWE NQK E+E VA+ IRQK+ QG +KD VL+GD +Y++ L ++F+ Y+I 
Sbjct: 291 DEKYFRIWEAENQKVEIERVAKEIRQKIIQGAFFKDFTVLVGDPAAYEITIjKEVFDLYEI 350 

Query: 352 PYYFGKAETMAAHPLVHFMDSLSRI KRYRFRAEDVLNLFKTGI YGEI SQDD - - LDYFEAY 409 

P+++ + E+M+ HPLV F +SL IK+ +R +DV+NL K+ +Y + + D+ +DYFE Y 
Sbjct: 351 PFFYAQEESMSQHPLVIFFESLFAIKXNNYRTDDWNLLKSKVYTDANLDEEVIDYFEYY 410 

Query: 410 ISYADIKGPKKFFTDFWGAKKFDLGRLNTIRQSIiL TPLESFV- KTKKQDGIKTLNQ 465 

+ I G KKF +F+ ++ + +N +R+ LL +PL+ F+ +K+ G K ++ 
Sbjct: 411 VQKYKISGRKKFTEEFIE-SEFSQIELVNEMREKLLGSESPLQVFLGNNRKKTGKKWVSD 469 

Query: 466 FMFFLTQVGLSDNLSRLVGQMS - ENEQE KHQEVWKTFTDILEQFQTIFGQEKLNLDE 521 

L + N++ +NE + KH++VW+ L +F +F EKL E 

Sbjct: 470 LQGIiLENGNVMTNMNAYFSAAELQNEHQMADiQIEQVWQML I STLNEFLAVFSDEKLKSVE 529 

Query: 522 FLSLLNSGMMQAEYRMVPATVDVVTVKSYDLVEPHSNQFVYALGMTQSHFPKIAQNKSLI 581 

FL +L +G+ A+YR +PA VDW VK Y+LVEP +N+++YA+G++Q++FP+I +N 4L+ 
Sbjct: 530 FLDILLAGLKNAKYRQIPANVDVVNVKDYELVEPKTNKYIYAIGLSQTNFPRIia<NSTLL 589 

Query: 582 SDIERQLINDANDTDGHFDIMTQENLKKNHFAALSLFNAAKQELVLTIPQLLNESEDQMS 541 

SD ER IN D + + + N +KN F LSL N+AK+ LVL++PQ++ + + S 
Sbjct: 590 SDEERLEINQTTDENQFIEQLOTANYQKNQFTVLSLINSAKESLVLSMPQIMANEQGEFS 549 

Query: 642 P-YLVELRDIGVPFOTKGR-QSLKEE1ADNIG1TYKALLSRVVDLYRSAIDKEMTKEE-QTF 698 

P + + L+D K + 4-L E ++IGN +++++ + + R ++ETE+ + F 

Sbjct: 650 PVFQLFLKDADEKILQKIQGVNLFESLEHIGNSRSVIAMIGQIERELVESEETSEDKRVF 709 

Query: 699 WSVAWYLRRQLTSKGIEIPIITDSLDTVTVS8DVMTRRFPEDDPLKLSSSALTTFYNNQ 758 

WS R L + + + +DTV ++D + + + D + SS+ FYN + 

Sbjct: 710 WSSIFRILVKSNADFQKILLDLAKDIDTVNLAPDTLEQIY- -GDKIYASVSSFERFYNCE 767 

■ Query: 759 YKYFLQYVLGLEEQDSIHPDMRHHGTYLHRVFEILMKNQGI - -ESFEEKLNSAINKTNQE 816 
Y+YFL+ C LE ++I + + G + H VFE +MK + E+F+EKL + + ++ 
Sbjct: 768 YQYFLENTLSLETFENIDINSKIVGKFFHEVFEKVMKETDLSAENFDEKLTLVLQEVDKN 827 

Query: 817 DVFKSLYSEDAESRYSLEILEDIARATATILR QDSQMTVESE EERFELM 865 

+ +++DA +R++ LE+I R TAT+L+ D T+ +E E 

Sbjct: 828 --YSRYFTQDATARFTWSNLEEIVRQTATVLKATVSTDELKTLLTESSFGLPKSELGNFS 885 

Query: 866 IDNTIKINGIIDRIDRLSDGSLGVVDYKSSAQKPDIQKFYNGLSPQLVTYIDAISRDKEV 925 

+D+ I + G IDR+D+LS LG +DYKSSA F +Q+ Y+GDS Q +TY+D I K+ 
Sbjct: 886 VDD-IYLRGRIDRLDQLSTDYLGAIDYKSSAHSFKLQEAYDGLSLQFMTYLDVI---KQA 941 
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Query: 926 EQKPPIFGAMYLHMQEPRQDLSKIIO&DDLOTKimQALTYKGLFSEaEKEFLANGKYHL- 984 

I+GA+YL + 4LS+I L ++ +++ Y+GL E E + G ++ 

Sbjct: 942 FPNQKIWGALYLQFKNQPINLSEINQLSEIAN1LKESMRYEGLVLEDAAEQI-KGIENIA 1000 

Query: 985 --KDSLYSETEIAILQAHNQSLYKKASETIKSGKFLINPYTEDAKTVDGD Q 1033 

K ++Y+E E L N+ Y+ A + +K GK INP + ++ +D 
Sbjct: 1001 LKKTNIYiNEEEFEQLLKLNEEHYRAAGQRLKKGKIAINPIMKRSEGIDQSGNVRGCRYCP 1060 

Query: 1034 FKSITGFEADRHM 1046 

KSI FEA+ HM 
Sbjct: 1061 LKSICRFEANIHM 1073 

A related DNA sequence was identified in S.pyogenes <SEQ ID 4295> which encodes the amino acid 
sequence <SEQ ID 4296>. Analysis of this protein sequence reveals the following: 

Possible site: 23 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm --- Certainty=0 . 1891 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 546/1075 (50%) , Positives = 758/1075 (69%) , Gaps = 11/1075 (1%) 

MlCTjLYTDIlfflDMTEILWC^iAHAAEAGWRIFYIAPNSLSFEKERAVLENLPQEASFAITI 6 0 
MKL+YT++++ MTEILVN+A AA+ G+R+FYIAPNSLSFEKER VL LP+ +F+I + 
MKLI YTEMSYSMTEILVNEARKAADQGYRVFYIAPNSLSFEKEREVLTLLPERGTFS I IV 60 

TRFAQLARYFTLNQPNQKESIJroiGLftMIFYRALASFEDGQLKVFGRLKQDASFISQLVD 12 0 
TRF Q++RYFT+ K+ L+D LAMIFYRAL + L +GRL+ ++ FI QLV+ 

TRFVQMSRYFTVESSPSKQHLDDTTLAMIFYRALMQLKPEDLPSYGRLQNN8VFIEQLVE 120 



Query: 


1 


Sbjct: 


1 


Query: 


61 


Sbjct: 


61 


Query: 


121 


Sbjct: 






181 


Sbjct: 






241 


Sbjct: 


241 


Query: 


301 


Sbjct: 


301 


Query: 


361 


Sbjct: 


361 


Query: 


421 


Sbjct: 


421 


Query: 


480 


Sbjct: 


481 


Query: 


540 


Sbjct: 


539 



C +QK+E+EHVA++IRQKLY+GYRYKDILVLLGD4-D+Y+LQ+ IF++++IPYY GKAE 



Y L LN +RQ ++ PL+ K++KQ G 



S+ E EK++EVWK FTDIL F IFGQEKti L + L+L+ +GM A+YR+VP 



AT+DWT+ KS YDLV+ PHS FVYA+G+TQSHFPK + L+SD ER IN+ 
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Query: 


600 


Sbj ct: 


598 


Query: 


660 


Sbjct: 


658 


Query: 


720 


Sbjct: 


717 


Query: 


780 


Sbjct: 


777 


Query: 


840 


Sbjct: 


837 




900 


Sbjct: 


897 


Query: 


957 


Sbj ct : 


954 



DIMTQElttKKNHFAALSLFNAAKQELVLTIPQLLNBSEDQMSPYLVELRDIGVPFNHKGR 659 
DI + EN KKNH ALSLFNAA +ELVL++ ++NE+ D +SPYL EL + G+P KG+ 
DIASAENSKKNHQTALSLFNAATKELX/LSVSTVINETFDDLSPYLKELINFGLPLLDKGK 657 



+IGNYKALLS+++ + E 



- FW+V +RYLR+QL + +E+P 



R HG YLHRVFE LMK+ E F+ KL AI TNQE F+ +Y ++AE+ YSL ILEDI 
RIHGQYLHRVFERLMKDHTQEPFDNKLKQAIYHTNQESFFQQVYQDNAEAEYSLAILEDI 836 



DI FYNGLSPQL+TY+ I 



++ALTYKG+FSE EKE L+ G Y K++LYS E+ L +N+ LY KA++ IK G 



Query: 1017 FLINPYTEDAKTVDGDQFKSITGFEADRHMARARALYKLPAKEKRQGFLTLMQQE 1071 

FLINPYT D KTV GDQ K+IT FEAD M +AR L LPAKEK++ FLTLM++E 
Sbjct: 1014 FLINPYTSDGKTVQGDQLKAITRFEADLDMGQARRLVTLPAKEKKECFLTLMRKE 1068 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 



Example 1400 

A DNA sequence (GBSxl485) was identified in S.agalactiae <SEQ ID 4297> which encodes the a 
acid sequence <SEQ ID 4298>. Analysis of this protein sequence reveals the following: 



Possible site: 31 

>>> Seems to have no N-terminal signal sequence 
INTEGRAL Likelihood = -7.80 



- Final Results 

bacterial membrane Certainty=0. 4121 (Affirmative) < £ 

bacterial outside — Certainty=0 . 0000 (Not Clear) < sue 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < sue 



A related GBS nucleic acid sequence <SEQ ID 8799> which encodes amino acid sequence <SEQ ID 8800> 
was also identified. Analysis of this protein sequence reveals the following: 



Lipop: Possible site: -1 Crend: 10 
McG: Discrim Score: -20.62 
GvH: Signal Score (-7.5): -6.25 

Possible site: 31 
»> Seems to have no N-terminal signal sequence 
ALOM program count: 1 value: -7.80 threshold: 
INTEGRAL Likelihood = -7.80 
PERIPHERAL Likelihood = 3.34 
modified ALOM score: 2.06 



60 



>** Reasoning Step: 3 



WO 02/34771 



PCT/GB01/04789 



Final Results 

bacterial membrane Certainty=0 .4121 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC75528 GB:AE000334 orf, hypothetical protein [Escherichia coli K12] 
Identities = 138/297 (46%), Positives = 193/297 (64%), Gaps = 16/297 (5%) 

MKIDDLRKSDNVEDRRSSSGGSFSSGGSGLPILQLLLLRGSWKTKLWLIILLLLG- -GG 62 
M+ R+SDNVEDRR+SSGG S GG G + S K L++LI++L+ G G 

MRWQGRRESDNVEDRRNSSGGP-SMGGPGFRL PSGKGGLILLIVVLVAGYYGV 52 





5 


Sbjct: 


1 


Sbjct: 


63 




123 


Sbjct: 


107 




183 


Sb j Ct : 


167 




243 


Sbjct: 


227 



HEVGHH+Q LGI K +++ T+ E N L+VR+ELQAD +AGW H -t 



D EEA+NAA A+GDD LQ+++ G++VPDSFTHGT++QR 



A related DNA sequence was identified in S.pyogenes <SEQ ID 4299> which encodes the amino acid 

sequence <SEQ ID 4300>. Analysis of this protein sequence reveals the following: 

Possible site: 41 
>» Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -6.42 ■ Transmembrane 48 - 64 ( 41 - 67) 

Final Results 

bacterial membrane Certainty=0.356S (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=D . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

, >GP:AAC75528 GB:AE000334 orf, hypothetical protein [Escherichia coli] 

Identities = 143/301 (47%) , Positives = 195/301 (64%) , Gaps = 21/301 (6%) 

Query: 1 MKTDDLRESQQVEDRRGQSSG-SFGGGGLGGGLLLQLLFSRGGWKTKLVILLLLL^G- - 57 

M+ RES VEDRR S G S GG G +L +GG L++L+++LV G 
Sbjct: 1 MRWQGRRESDNVEDRRNSSGGPSMGGPGF RLPSGKGG LILLIWLVAGYY 50 

Query: 58 GGGLSGVLGGKPSSTNNNAYQSSQ 1 /TRTNGDKASQEQVSFVSK\''FASTEDYWTKTFREKG 117 

G L+G++ G+P S QS++ N D+A++ F S + A+TED W + F + G 
Sbjct: 51 GVDLTGLMTGQPVSQQ QSTRS I SPKEDEAAK FTSVILATTEDTWGQQFEKMG 102 

Query: 118 LTYHKPTLVLYTGATQTACGRGQASSGPFYCPGDQKVYLBISFYNELSTKYGAKGDFAMA 177 

TY +P LV+Y G T+T CG GQ+ GPFYCP D VY+D+SFY+++ K GA GDFA 
Sbjct: 103 KTYQQPKLVMYRGMTRTGCGAGQSIMGPFYCPADGTVYIDLSFYDDMKDKLGADGDFAQG 162 

Query: 178 WIAHEVGHHIQNELGIMDNYASARC^KSKAKANQLNVKLELQADYYAGAWANYVQGQGL 237 

YVIAHEVGHH+Q LGI +Q ++A+ N+L+V++ELQAD +-AG W + +Q QG+ 

Sbjct: 153 WIAHEVGHHVQKLLGIEPKVRQLQQmTQAEVl^LSVRraLQADCFAGVMGHSMQQQGV 222 

Query: 238 LEKGDIEFJ^MSAAHAVGDDTLQEETYGRTVPDSFTHGTSKQRQRWFDRGYQYGDFEHGDTF 298 

LE GD+EEA+ AA A+GDD LQ+++ GR VPDSFTHGTS+QR WF RG+ GD +TF 
Sbjct: 223 LETGDLEFJMJNAAQAIGDDRLQQQSQGRWPDSFTHGTSQQRYSWFKRGFDSGDPAQCNTF 283 
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An alignment of the GAS and GBS proteins is shown below. 

Identities = 191/303 (63%) , Positives = 241/303 (79%) , Gaps = 5/303 (1%) 

MKIDDLRKSDNVEDRRSSSGGSFSSGG-SGLPILQLLLLRGSWKTKLWLIILLLLGGGG 63 
MK DDLR+S VEDRR S GSP GG G +LQLL RG WKTKLV+L++LL++GGGG 
MKTDDLRESQQVEDRRGQSSGSFGGGGLGGC-LLLQLLFSRGGWKTKLVILLLIjLvMGGGG 6 0 



Query: 


5 


Sbjct: 


1 




64 


Sbjct: 




Query- 


121 








181 


Sbjct: 


180 




241 


Sbjct: 


240 




301 


Sbjct: 


300 



Y +P LVLYT + QT CG G+++SGPFYC D+K+YLDISFYNELS KYGA GDFAMAYV 



IAHEVGHHIQ ELGIMD Y R G +K +AN LNV+LELQftDYYAG WA+Y++G+ LLE 



+GD EEAM AAHAVGDDTLQ+ETYG+ VPDSFTHGT++QRQRWF++G+QYGD +HGDTFS 



SEQ ID 8800 (GBS404) was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 171 (lane 3; MW 62kDa). 

GBS404-GST was purified as shown in Figure 218, lane 7. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 



Example 1401 

A DNA sequence (GBSxl486) was identified in S.agalactiae <SEQ ID 4301> which encodes the amino 
acid sequence <SEQ ID 4302>. This protein is predicted to be phenylalanyl-tRNA synthetase beta chain 
(pheT). Analysis of this protein sequence reveals the following: 

N- terminal signal sequence 

40 Final Results 

bacterial cytoplasm Certainty=0 . 2617 (Affirmative) < suco 

bacterial membrane Certainty=0.0000 (Not Clear) < succ> 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

45 The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB14823 GB:Z99118 phenylalanyl-tRNA synthetase (beta subunit) 
[Bacillus subtilis] 
Identities = 376/805 (46%) , Positives = 523/805 (64%) , Gaps = 6/805 (0%) 

50 Query: 1 MIiVSYKWLKELVDVD-VTTAEIiAEKMSTTGIEVEGVETPAEGLSKLVVGHIVSCEDVPDT 59 

M VSYKWL++ VD+ + A LAEK++ GIEVEG+E EG+ +V+GH++ E P+ 
Sbjct: 1 MFVSYKWLEDYVDLKGMDPAVLAEKITRAGIE\rEGIEYKGEGIKGWIGHVLEREQHPNA 50 

Query: 60 H-LHLCQVDTGDDELRQWCGAPNVKK3INVIVAVPGARIADNYKIICKGKIRGMESLGMI 118 
55 L+ C VD G + Q++CGAPNV G V VA GA + N+KIKK K+RG ES GMI 

Sbjct: 61 DIOjNKCLVDIGAEaPVQIICGAPNVDKGQWAVA^/GAVLPGNFKIKKAKLRGEESNGMI 120 
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Query: 119 CSLQELGLSESIIPKEPSDGIQILPEGAIPGDSIFSYLDLDDEIIELSITPNRflDALSMR 178 

CSLQELG+ ++ KE+++GI + P A G + L WD I+EL +TPNRADA++M 
Sbjct: 121 CSLQELGIESKLVAKEYAEGIFVFPlTOAETGSDAIjAALQLDDAIL^ 180 

Query: 179 GVAHEVAAIYGKKVHFEEKMLIEEAERAADKISWIESDKVliS-YSARIVKNVTVAPSPQ 237 

GVA+EVAAI +V + + +E+A+D ISV IE + Y+A+I+KNVT+APSP 
Sbjct: 181 GVAYEVA^ILDTEVKLPQTDYPARSEQASDYISVKIEDQESNPLYTAKIIKNVTIAPSPL 240 

Query: 238 WLQNKLMNAGIRPINNWDVTNYVLLTYGQPMHAFDFDKFDGT^ 297 

W-l-Q KLMNAGIRP NNVVD+TN+VLL YGQP+HAFD+D+F +V R A E 4-+TLD 
Sbjct: 241 WMQTKL^AGIRPHJSnSFVVDITNFVLIjEYGQPLHAFDYDRFGSKEVVVRKAAENEMI\rri^ 300 

Query: 298 GEERDLIADDLVIAVNDQPVALAGVMGGQSTE I GS S SKTWLEAAVFNGTS IRKTSGRLN 357 

+ER L AD LVI + A+AGVMGG +E+ +KT++LEAA FNG +RK S h 
Sbjct: 301 DQERKLSADHLVITNGTKAQAVAGWGGAESEVQEDTKTILLEAAYFNGQKVRKA.SKDLG 360 



Query: 418 VNTRLGTELTYTDIEEVFEKLGFAISGSEVKFTVLVPRRRMDIA10ADLVEEIARIYGYE 477 

V++ LG ++ ++ ++++LGF +■ ++ V VP RR DI 1+ DL+EE AR+YGY+ 
Sbjct: 421 VSSVLGLTISKEELISIYKRLGFIVGEADDLLWTVPSRRGDITIEEDLIEEAARI,YGYD 480 

Query: 478 KLPTTLPEAGATAGELTSMQRLRRRVRTVAEGAGLSEIITYALTTPEKAVQFSTQATNIT 537 

+P+TLPE T G LT Q RR+VR EGAGLS+ ITY+LT +KA F+ + + T 
Sbjct: 481 NIPSTLPETAGTTGGLTPYQAKRRKVRRFIiEGAGLSQAITYSLraEKKATAFAIEKSLOT 540 

Query: 538 E^PmTORSALRQmWSGMLDTIA-mVARKI^SNLAVYEIGKVFEQTGNPKEDLPTEVE 597 

L PM+ +RS LR +4V +U3+++YN+AR+ ++A+YE+G VF +¥ PEE 

Sbjct: 541 VIALPMSEERSILRHSLVPWLLDSVSYNLARQTDSVALYEVGSVF--LTKEEDTKPVETE 598 

Query: 598 TFTFALTGliVEEKDFQTKSKPVDFFYAKGIVEALFIKLK-LDVTFVAQKGLASMHPGRTA 656 

A+TGL ++ +Q + KPVDFF KGIVE L KL LD Q +HPGRTA 

Sbjct: 599 RVAGAVTGLWRKQLWQGEIQCPVDFFVVKGIVEGI1I1DKI1NVLDSIEFVQSEIRKQLHPGRTA 658 

Query: 657 



Query; 717 IALLLAESVSHHDIVSAIETSGVKRLTAIKLFDVYAGNNIAEGYKSMAYSLTFQHPNDNL 776 

IAL4 ++V+ + S 1+ +G K L + +FDVY G ++ EG KS+A+SL + NP L 
Sbjct: 719 lALVTDKTVTSGQLESVIKEAGGKIjLKEVrVFDVYEGEHMEEGKKSVAFSLQYVNPEQTL 778 

Query: 777 TDEEVAKYMEKITKSLVEKVWAEIR 801 

T+EEV K K+ K+L + A +R 
Sbjct: 779 TEEEVTKAHSKVLKALEDTYQAVLR 803 

A related DNA sequence was identified in S.pyogenes <SEQ ID 4303> which encodes the amino acid 
sequence <SEQ ID 4304>. Analysis of this protein sequence reveals the following: 



Final Results 

bacterial cytoplasm Certainty=0. 1283 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 595/801 (74%) , Positives = 687/801 (85%) 

Query: 1 MLVSYKWLKELVDTOVTTAELAEKMSTTGIEVEGVETPAEGLSKLWGHIVSCEDVPDTH 60 

MLVSYKWLKELVD+DVT A liAEKMSTTGIEVEG+E PA+GLSKLWGH++SCEDVP+TH 
Sbjct: 6 MLVSYKWIjKELVDIDVTPAABAEKMSTTGIEVEGIEVPADGLSKLVVGHVLSCEDVPETH 65 
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Query: 61 LHLCQVDTGDDELRQWCGAPNVKTGINVIVAVPGARIADNYKIKKGKIRGMESLGMICS 120 

LHLCQVDTGD+ RQ+VCGAPNVK GI VrVAVPGARIADNYKI KKGKI RGMESLGMI CS 
Sbjct: 66 LHLCQVDTGDETPRQI VCGAPNVK&G I KVIVAVPGARIADNYKI KKGKI RGMESLGMI CS 125 

Query: 121 LQELGLSESIIPKEFSDGIQILPEGAIPGDSIFSYLDLDDEIIELSITPNRADALSMRGV 180 

LQELGLS+SIIPKEFSDGIQILPE A+PGD+IF YLDLDD I IELS ITPNRADALSMRGV 
Sbjct: 126 LQELGLSDS 1 1 PKEFSDGIQILPEEAVPGDAI FKYLDLDDHI IELS ITPNRADALSMRGV 185 

Query: 181 AHEVAAIYGKKVHFEEKNLIEEAERAADKISWIESDKVLSYSARIVKNVTVAPSPQWLQ 240 

AHEVAAIYGK V F +KNL E + ++ I V I SD VL+Y++R+V+NV V PSPQWLQ 
Sbjct: 186 AHEVAAIYGKSVSFPQKNLQESDKATSEAIEVAIASDNVLTYASRWENVKVKPSPQWLQ 245 

Query: 241 NKLMNAGIRPIMSIVVDVTNYVLLTYGQPMKAFDFDKFDSTTIVARKAENGEKLITLDGEE 300 

N LMNAGIRPIKINVVDVTNYVLL +GQPMHAFD+DKF+ IVAR A GE L+TLDGE+ 
Sbjct: 246 NLLMNAGIRPINNWDVTMYVLLYFGQPMHAFDYDKFEDHKIVARAARQGESLVTLDGEK 3 05 

Query: 301 RDLIADDLVI AVNDQPVALAGVMGGQSTE IGS S S KTWLEAAVFNGTS I RKTSGRLNLRS 360 

RDL +DLVI V D+PVALAGVMGGQ+TEI ++S+TWLEAAVF+G S I RKTSGRLNLRS 
Sbjct: 306 RDLTTEDLVIWADKPVALAGVMGGQATEIDANSQTWIiFAAVFDGKSIRKTSGRLNLRS 355 

Query: 3 61 ESSSRFEKGIOTDTVSFJMDFAAAMLQELAGGQVLSGQVTEGVLPTEPVEVSTTLGYVNT 420 

ESSSRFEKG+NY TV EA+DFAAAMLQELA GQVLSG V G LPTEPVEVST+L YVN 
Sbjct: 366 ESSSRFEKGVNYATVLEALDFAAAMLQELAEGQVLSGHVQAGQLPTEPVEVSTSLDYVNV 425 

Query: 421 RLGTELTYTDIEEVFEKLGFAISGSEVKFTVLVPRRRWDIAIQADLVEEIARIYGYEKLP 480 

RLGTELT+ DI+ +F++LGF ++G 3 FTV VPRRRWD++I ADLVEE I ARI YGY+ KLP 
Sbjct: 426 RLGTELTFADIQRIFDQLGFGLTGDETSFTVAVPRRRWDVSIPADLVEEIARIYGYDKLP 485 

Query: 481 TTLPEAGATAGELTSMQRLRRRVRTVAEGAGLSEIITYALTTPEKAVQFSTQATNITELM 540 

TTLPEAG TA ELT Q LRR+VR +AEG GL+EII+YALTTPEKAV+F+ +++TELM 
Sbjct: 486 TTLPEAGGTAAELTPTQALRRKVRGLAEGLGLTEIISYALTTPEKAVEFAVAPSHLTELM 545 



Query: 601 FALTGLVEEKDFQTKSKPVDFFYAKGIVEALFIKLKLDVTFVAQKGLASMHPGRTATILL 660 

FA+ GLV +KDFQT+++ VDF++AKG ++ LF L h V +V K LA+MHPGRTA ILL 
Sbjct: 606 FAI CGLVAQKDFQTQAQAVDFYHAKGOTliDTLFANLNLKVQWPTKDLANMHPGRTAL I LL 665 

Query: 661 DGKEIGFVGQVHPQTAKQYDIPETYVAEINLSTISSQMNQALIFEDITKYPSVSRDIALL 720 

D + IGFVGQVHP TAK Y IPETYVAE++++ +E+ + F +ITK+P+++RD+ALL 

Sbjct: 666 DEQVIGFVGQVHPGTAKAYSIPETYVAELDMAALEAALPSDQTFAEITKFPAMTRDVALL 725 

Query: 721 LAESVSHHDIVSAIETSGVKRLTAIKLFDVYAGNNIAEGYKSMAYSLTFQNPNDNLTDEE 780 

L VSH IV+AIE++GVKRLT+IKLFDVY GIG KSMAYSLTFQNPNDNLTDEE 
Sbjct: 726 LDREVSHQAIVTAIESAGVKRLTSIKLFDVYEGATIQAGKKSMAYSLTFQNPNDNLTDEE 785 

Query: 781 VAKYMEKITKSLVEKVNAEIR 801 

VAKYMEKITK+L E+V AE+R 
Sbjct: 786 VAKYMEKITKALTEQVGAEVR 806 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1402 

A DNA sequence (GBSxl487) was identified in S.agalactiae <SEQ ID 4305> which encodes the amino 
acid sequence <SEQ ID 4306>. Analysis of this protein sequence reveals the following: 

Possible site: 43 

»> Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 0653 (Affirmative) < suco 

bacterial membrane Certainty=0 .0000 (Not Clear) < suco 
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bacterial outside Certainty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9769> which encodes amino acid sequence <SEQ ID 9770> 
was also identified. 

5 The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB15205 GB:Z99120 transcriptional regulator [Bacillus subtilis] 
Identities = 60/169 (35%) , Positives = 100/169 (58%) 

Query: 17 ITFKCTGLDNVWILQNIAIETFRQTFSHDNSEEQIjQRFBTOESYTLPVLKSEITHAESDTY 76 
10 + KK +++ LQ ++IETF TF NS E ++A+ ++ L+ E+++ S + 

Sbjct: 3 VKMKKCSREDLQTLQQLSIETFNDTFKEQNSPENI1KAYLESAFNTEQLEKELSNMSSQFF 62 

Query: 77 FWLDTDLVGYLKVNWGSQQTEKDLDKAFEIQRIYLLDAYQGQG1GKATFEFALDLAYKS 136 
F+Y D ++ GY+KVN Q+E+ ++ EI+RIY+ +++Q G+GK A+++A + 

15 Sbjct: 63 FIYFDHEIAGYVKVNIDDAQSEEMGAESIiEIERIYIKNSFQKHGLGKHLIiNKAIEIALER 122 

Query: 137 GLDWAWLGVWEFNHKAQAFYAKYGFEKFSEHQFSVGDKVDTDWLLRKSL 185 

WLGVWE N A AFY K GF + H F +GD+ TD ++ K+L 
Sbjct: 123 NKKNIWLGVTOKlffiNAIAFYKKMGFVOTGAHSFYMGDEEQTDLIMAKTL 171 

20 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1403 

25 A DNA sequence (GBSxl488) was identified in S.agalactiae <SEQ ID 4307> which encodes the amino 
acid sequence <SEQ ID 4308>. This protein is predicted to be phenylalanyl-tRNA synthetase (alpha 
subunit) (pheS). Analysis of this protein sequence reveals the following: 

Possible site: 45 

»> Seems to have no N-terminal signal sequence 

30 

Final Results 

bacterial cytoplasm Certainty=0. 3937 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

35 

A related GBS nucleic acid sequence <SEQ ID 9339> which encodes amino acid sequence <SEQ ID 9340> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB14824 GB:Z99118 phenylalanyl-tRNA synthetase {alpha subunit) 
40 [Bacillus subtilis] 

Identities = 209/338 (61%) , Positives = 270/338 (79%) , Gaps = 2/338 (0%) 



Query: 1 MKISTQEKLKEM-TGNHTKELQDLRVQVLGKKGSLTELLKGLKDLSNDLRPWGKQVNEV 59 

+K QE L+++ + K + D+RVQ LGKKG +TE+L+G+ LS + RP +G NEV 
Sbjct: 5 LKQLEQFALEQVEAASSLKVvTvTDIRVQYLGKKGPITEVIiRGMGKLSAEERPKMGAIjANEV 64 

Query: 60 RDILTKAFEEQAKVVEAAKIQAQLESESVDVTLPGRQMTLGHRHVLTQTSEEIEDIFLGM 119 

R+ + A ++ + +E +++ +L +++DVTLPG + +G RH LT EEIED+F+GM 
Sbjct: 65 RERIAmiADKNEKLEEEEMKQKl^GQTIIWTLPGNPvAVGGRHPLTVVIEEIEDLFIGM 124 

Query: 120 GFQVVDGFEVEKDYYNFERNNLPKDHPARDMQDTFYITEEILLRTHTSPVQARTMDQHDF 179 

G+ V +G EVE DYYNFE +NLPK+HPARDMQD+FYITEE L+RT TSPVQ RTM++H+ 
Sbjct: 125 GYTVEEGPEVETDYYNFESLNLPKEHPARDMQDSFYITEETLMRTQTSPVQTRTMEKHE- 183 



55 Query: 180 SKGPLKMISPGRVFRRDTDDATHSHQFHQIEGLWGENISMGDLKGTLQLISQKMFGAER 239 
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KGP+K+I PG+V+RRD DDATHSHQF QIEGIiW +NISM DLKGTL4-L+++KMFG +R 
Sbjct: 184 GKGPVKIICPGKVYRRDNDDATHSHQFMQIEGLWDKNISMSDLKGTLELVAKKMFGQDR 243 

Query: 240 KIRLRPSYFPFTEPSVEVDVSCFKCGGKGCWCKQTGWIEILGAGMVHPSVLEMSGIDSE 299 
5 +IRLRPS+FPFTEPSVEVDV+CFKCGG GC+VCK TGWIEILGAGMVHP+VL+M+G D + 

Sbjct: 244 EIRLRPSFFPFTEPSVEVDVTCFKCGGNGCSVCKGTGWIEILGAGMVHPNVLKMAGFDPK 303 

Query: 300 KYSGFAFGLGQERIAMLRYGINDIRGFYQGDVRFTDQF 337 
+Y GFAFG+G ERIAML+YGI+DIR FY DVRF QF 
10 Sbjct: 304 EYQGFAFGMGVERIAMLKYGIDDIRHFYTHDVRFISQF 341 

A related DNA sequence was identified in S.pyogenes <SEQ ID 4309> which encodes the amino acid 
sequence <SEQ ID 4310>. Analysis of this protein sequence reveals the following: 

Possible site: 19 
15 >>> Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .2806 (Affirmative) < suco 

bacterial membrane Certainty=0 .0000 (Not Clear) < suco 

20 bacterial outside Certainty=0. 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 305/337 (90%) , Positives = 327/337 (96%) 

25 Query: 1 MKISTQEKLKEMTGNHTKELQDI1.RVQVLGKICGSLTELLKGLKDLSNDLRPVVGKQVNEVR 60 

+K T E L+ +TGNHTKELQDLRV VLGKKGSLTELLKGLKDLSNDLRPWGKQVNEVR 
Sbjct: 36 LKTKTLETLQSLTGNHTKELQDLRVAVLGKKGSLTELLKGLKDLSNDLRPWGKQVWEVR 95 



Query: 61 DILTKAFEEQAKVVEAAKIQAQLESESVDVTLPGRQMTLGHRHVLTQTSEEIEDIFLGMG 120 

D+LTKAFEEQ7AK-I VE7AAKIQAQL++ES+DVTLPGRQMTLGHRHVLTQTSEEIEDIFLGMG 
Sbjct: 96 DLLTKAFEEQAKIVEAAKIQAQLDAESIDVTLPGRQMTIiGHRHVLTQTSEEIEDIFLGMG 155 

Query: 121 FQVVDGFEVEKDYYNFEPJINLPKDHPARDMQDTFYITEEILLRTHTSPVQARTMDQHDFS 180 

FQ+VDGFEVEKDYYNFERMNLPKDHPARDMQDTFYITEEILLRTHTSPVQART+DQHDFS 
Sbjct: 156 FQIVDGFEVEKDYYNFERMNLPKDHPARDMQDTFYITEEILLRTHTSPVQARTLDQHDFS 215 

Query: 181 KGPLKTIISPGRVFRRDTDDATHSHCFHQIEG^WGENISMGDLKGTLQLISQKMFGAERK 240 

KGPLKM+SPGRVFRRDTDDATHSHQFHQIEGLWG+NISMGDLKGTL++I +KMFG ER 
Sbjct: 216 KGPLKMVSPGRVFRRDTDDATHSHQFHQIEGLWGKNISMGDLKGTLEMIIKKMFGDERS 275 

Query: 241 IRLRPSYFPFTEPSVEVDVSCFKCGGKGCNVCKQTGWIEILGAGMVHPSVLEMSGIDSEK 300 

IRLRPSYFPFTEPSVEVDVSCFKCGGKGCNVCK+TGWIEILGAGMVHPSVLEMSG+D+++ 
Sbjct: 276 IRLRPSYFPFrEPSVEVDVSCFKCGGKGCNVCKKTGKIEILGAGMVHPSVLEMSGVDAKE 335 



Sbjct: 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 



Example 1404 

A DNA sequence (GBSxl489) was identified in S.agalactiae <SEQ ID 431 1> which encodes the amino 

acid sequence <SEQ ID 4312>. Analysis of this protein sequence reveals the following: 

Possible site: 13 
55 >» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2834 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

60 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 
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The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1405 

A DNA sequence (GBSxl490) was identified in S.agalactiae <SEQ ID 4313> which encodes the amino 
acid sequence <SEQ ID 43 1 4>. Analysis of this protein sequence reveals the following: 

3 N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2762 (Affirmative) < suco 

bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 



Example 1406 

A DNA sequence (GBSxl491) was identified in S.agalactiae <SEQ ID 4315> which encodes the amino 
acid sequence <SEQ ID 4316>. This protein is predicted to be DNA-entry nuclease. Analysis of this protein 
sequence reveals the following: 

i uncleavable N-term signal seq 

Final Results 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 8801> which encodes amino acid sequence <SEQ ID 8802> 

was also identified. Analysis of this protein sequence reveals the following: 

Lipop Possible site: -1 Crend: 5 
McG: Discrim Score: 10.13 
GvH: Signal Score (-7.5): -5.07 

Possible site: 23 
»> Seems to have an uncleavable N-term signal seq 
ALOM program count: 1 value: -6.79 threshold: 0.0 

INTEGRAL Likelihood = -5.79 Transmembrane 8 - 24 ( 6 - 27) ' 
PERIPHERAL Likelihood = 6.26 258 
modified ALOM score: 1.8S 

*** Reasoning Step: 3 

Final Results 

bacterial membrane Certainty=0. 3718 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 
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The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAA38134 GB:X54225 membrane nuclease [Streptococcus pneumoniae] 
Identities = 154/232 (56%), Positives = 180/232 (77%), Gaps = 1/232 (0%) 

5 Query: 41 K1WSGTPSRELSESVLTSNVKKQLGTN1AWNQSGAFIINQNKTDLNAKVSSAPYAINEIK 100 

K S PS+ L+ESVLT VK Q+ 4+ WN SGAFI+N NKT+L+AKVSS PYA N+ K 
Sbjct: 43 KQASEAPSQAIAESVLTDAVKSQIlfflSLEl-MGStaPIVlIGNKTNLDAKVSSKPyADNKTK 102 

Query: 101 KVNNQIVPTKOTALLTKATRQYP^IRSETGNGRTYMKPAGIJHQINGLKGSYNHAVDRGHliI ISO 
10 V 4 VPT ANALL+KATRQY+KR+ETGNG T W P GWHQ+ IiKGSY HAVDRGHL4 

Sbjct: 103 TVGKETVPWAWALLSKATRQYKNRKETGNGSTSWTPPGWHQVKNLKGSYTHAVDRGHLL 1S2 

Query: 161 GYALVG3LRGFDASTSNPKNIATQAAWANQAKSNQSTGQNYYETLVRKALDRHKTVRYRV 220 
GYAL+G L GFDASTSNPKNIA Q AWANQA 4 STGQNYYE4 VRKALD44K VRYRV 
15 Sbjct; 163 GYALIGGDDGFDASTSNPKNIAVQTAWANQAQAEYSTGQITYYESfWRKALDQKKRVRYRV 222 

Query: 221 TLIY-DRDNIjLSSGSHIEAKSSDGSLEFNVFIPNVQSGLLFDYATGKVKQTK 271 

TL Y 4+L4 S S IEAKSSDG LEFNV 4PHVQ GL DY TG4V T+ 
Sbjct: 223 TLYYASNEDLVPSflSQIEAKSSDGELEFl^fLiVPNVQKGLQLDYRTGEVTVTQ 274 

20 

There is also homology to SEQ IDs 368 and 1302. 

SEQ ID 8802 (GBS285) was expressed in E.coli as a ffis-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 56 (lane 6; MW 32H)a). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 60 (lane 7; MW 57.5kDa). 

25 GBS285-GST was purified as shown in Figure 208 (lane 7) and Figure 225 (lane 8). 

GBS658 was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell extract is 
shown in Figure 134 (lane 8 & 9; MW 27kDa). 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

30 Example 1407 

A DNA sequence (GBSxl492) was identified in S.agalactiae <SEQ ID 4317> which encodes the amino 
acid sequence <SEQ ID 431 8>. Analysis of this protein sequence reveals the following: 

Possible site: 27 

»> Seems to have a cleavable N-term signal seq. 

35 

Final Results 

bacterial outside Certainty=0 .3000 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0 .0000 (Not Clear) < suco 

40 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

45 Example 1408 

A DNA sequence (GBSxl493) was identified in S.agalactiae <SEQ ID 4319> which encodes the amino 
acid sequence <SEQ ID 4320>. This protein is predicted to be UDP-N-acetylglucosamine (murA). Analysis 
of this protein sequence reveals the following: 



WO 02/34771 



PCT/GB01/04789 



d N-terminal signal sequence 

Final Results 

bacterial cytoplasm — Certainty=0. 1814 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9767> which encodes amino acid sequence <SEQ ID 9768> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB15693 GB:Z99122 UDP-N-acetylglucosamine 

1-carboxyvinyltransferase [Bacillus subtilis] 
Identities = 248/423 (58%) , Positives = 323/423 (75%) , Gaps = 5/423 (1%) 

Query: 5 MDKI I VEGGQTQLQGQ WT EGAKNAVLPLIAATILPSC&KTLLTNVPILSDVFTMNNVVR 64 

M+KI1V GGQ +L G V +EGAKNAVLP++AA++L S+ K+++ +VP LSDV+T+N V+R 
Sbjct: 1 MKKIIWGGQ-KLNGTVTCVEGAKNAVIjPVIAASLIASEEKSVICDVPTLSDVYTINEVLR 59 

Query: 65 GLDIQTOraCDKKEILvDASGDILDVAPYEFVSQMRRSIVvlGPILARNGHAKVSMPGGC 124 

L V F + E+ V+AS + AP+E+V +MRAS++V+GP+LAR GHA+V++PGGC 
Sbjct: 60 HLGADVHF--ENNEVTVNASYALQTEAPFEYWKMRASVLVMGPLLARTGHARVALPGGC 117 

Query: 125 TIGSRPIDLHLKGLEAMGATITQNGGDITAQAE-KLKGANIYMDFPSVGATQNLIIMAATrj 183 

IGSRPID HLKG EAMGA I G I A+ + +L+GA IY+DFPSVGAT+NL+MAA L 
Sbjct: 118 AIGSRPIDQHLKGFFJ^GAEIKVGNGFIEAEVKGRLCGAKIYLDFPSVGATENLII^AAAL 177 

Query: 184 ASGTTTIENAAREPEITOLAQLIiNKMGAK^^ 243 

A GTTT+EN A+EPEIVDLA +N MG K++GAGT T+ I GV+ LHG +H ++ DRIEA 
Sbjct: 178 AEGTTTLENVAKEPEIVDIaANYINGMGGKIRGAGTGTIKIEGVEKLHGVKHHIIPDRIEft 237 

Query: 244 GTFMVAAAMTSGNVLVKDAITOHNRPLISKLMEMGVEVSEEEDGIRVKADTICICLKPVTVK 303 

GTFMVAAA+T GNVLVK A+ EH LI+KH- EMGV + +E +G+RV K+LKP+ +K 
Sbjct: 238 GTFMVAAAITEGNVLVKGAVPEHLTSLIAKMEEMGVTIKDEGEGLRV-IGPKELKPIDIK 296 

Query: 304 TLPHPGFPTDMQAQFTALMAWNGESTMIETVFENRFQHLEEMRRMGLQTEILRDTAMIH 363 

T+PHPGFPTDMQ+Q AL4- +G S + ETVFENRF H EE RRM +1 + +1+ 
Sbjct: 297 TMPHPGFPTDMQSQMMALLLRASGTSMITETVFENRFMHAEEFRRMNGDIKIEGRSVIIN 356 

Query: .364 GGRALQGAPWSTDLRASAALILAGMVAQGQTWGQLTHLDRGYYQFHEICLAALGANIKR 423 

G LQGA V +TDLRA AALILAG+VA+G T V +L HLDRGY FH+KLAALGA+I+R 
Sbjct: 357 GPVQLQGAEVAATDLRAGAALIIAGLVAEGHTRVTELKHLDRGYvDFHQKLAALGADIER 416 

Query: 424 VSE 426 



A related DNA sequence was identified in S.pyogenes <SEQ ID 4321> which encodes the amino acid 
sequence <SEQ ID 4322>. Analysis of this protein sequence reveals the following: 

Possible site: 3 9 
»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -3.03 Transmembrane 377 - 393 ( 376 - 394) 



Final Results 

bacterial membrane --- Certainty=0 .2211 (Affirmative) • 

bacterial outside Certainty=0. 0000 (Not Clear) < i 

bacterial cytoplasm --- Certainty=0 . 0000 (Not Clear) < i 

The protein has homology with the following sequences in the databases: 

>GP:CAB15693 GB:Z99122 UDP-N-acetylglucosamine 

1-carboxyvinyltransferase [Bacillus subtilis] 
Identities = 248/423 (58%) , Positives = 318/423 (74%) , Gaps = 5/423 
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Query: 1 VDKIIIEGGQTRLEGEWIEGAKNAVLPLLAASILPSKGKTILRMVPILSDVFTMNNVVR 60 

++KII+ GGQ +L G V +EGAKNAVLP++AAS+L S+ K+++ +VP LSDV+T+N V+R 
Sbjct: 1 MEKIIVRGGQ-MGTVKVEGAKNAVLPVIAASLrASEEKSVICDVPTLSDVYTINEVLR 59 

Query: 61 GLDIRVDFNEAANEITVDASGHILDEAPYEYVSQMRASIWLGPILARNGHAKVSMPGGC 120 

L V F NE+TV+AS + EAP+EYV +MRAS++V+GP+LAR GHA+V++PGGC 
Sbjct: 60 HLGADWFEN--NEVTWASYALQTEAPFEYTOKMRASV1VMGPLLARTGHARVALPGGC 117 

Query: 121 TIGSRPINLHLKGLEAMGATITQKGGDITAQAD-RLCGAMIYMDFPSVGATQNLJIMAATL 179 

IGSRPI+ HI.KG EAMGA I G 1 A+ RLQGA IY+DFPSVGAT+NL+MAA L 
Sbjct: 118 AIGSRPIDQHLKGFEAMGAEIKVGKGFIEAEVKGRLQGAKiyLDFPSVGATENLIMAAAL 177 

Query: 180 ADGVTTIENAAREPE1VDIAQFLNKMGARIRGAGTETLTITGVTHLRGVEHDWQDRIEA 239 

A+G TT+EN A+EPE1VDLA ++N MG +IRGAGT T+ I GV L GV+H ++ DRIEA 
Sbjct: 178 AEGTTTLENVAKEPElVDLANYINGMGGKIRGAGTGTIiaEGVEKLHGVKHHIIPDRIEA 237 

Query: 240 GTFMVAAAMTSGNVL1RDAVWEHNRPLISKLMEMGVSOTEEEYGIRVQAOTPKLKPVTVK 299 

GTFMVAAA+T GNVL++ AV EH LI+K+ EMGV++ +E G+RV +LKP+ +K 

Sbjct: 238 GTFMVAAAITEGNVLVKGAVPEHLTSLIAKI1SSM3VTIKDEGEGLRV-IGPKELKPIDIK 296 

Query: 300 TLPHPGFPTDMQAQFTALMAWNGESTMVETVFENRFQHLEEMRRMGLQSEILRETAMIH 359 

T+PHPGFPTDMQ+Q AL+ +G S + ETVFENRF H EE RRM +1 + +1+ 
Sbjct: 297 TMPHPGFPTDMQSQMMALLLRASGTSMITETVFENRFMHAEEFRRMNGDIKIEGRSVIIN 356 

Query: 360 GGRQLQGAPWSTDLRASAALILTGIVAQGVTIVNNLVHLDRGYYQFHEKLAKLGATISR 419 

G QLQGA V +TDLRA AALIL G+VA+G TV L HLDRGY FH+KLA LGA I R 
Sbjct: 357 GPVQLQGAEVAATDLRAGAALILAGLVAEGHTRVTELKHLDRGYVDFHQKLAALGADIER 416 

Query: 420 SSE 422 

Sbjct: 417 VND 419 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 363/422 (86%) , Positives = 391/422 (92%) 

mKIIVEGGQTQLQGQWIEGAKNAVLPLIAATILPSOGKTLLTWPILSDVFTMNNVVR 64 
+DKI I +EGGQT+L+G+ WIEGAKNAVLPLIiAA+ILPS+GKT+L NVPILSDVFTMNNWR 
VDKIIIEGGQTRLEGEVVIEGAKNAVLPLLAASILPSKGKTILRNWILSDVFTMNNVVR 60 

GLDIQVDFNCDKKE I LVDASGD ILDVAPYEFVSQMRAS I WLGP I LARNGHAKVSMPGGC 124 
GLDI+VDFN EI VDASG ILD APYE+VSQMRASIWLGPILARNGHAKVSMPGGC 



G TTIENAAREPEIVDLAQ LNKMGA+ + +GAGTETLTI GV L G EHDWQDRIEAG 



TFMVAAAMTSGNVL+ +DA+WEHNRPL I S KLMEMGV V+EEE GIRV+A+T KLKPVTVKT 



LPHPGFPTDMQAQFTALMAWNGESTM+ETVFEKRFQHLEEMRRMGLQ+EILR+TAMIHG 



GR LQGAPVMSTDLRASAALIL G+VAQG T+V L HLDRGYYQFHEKLA LGA I RSSE 
GRQLQ/^VMSTDLRASAALILTGIVAQGVTIVNNL\^LDRGYYQFHEKLAKLGATISRSSE 4 





5 


Sb j Ct : 






65 


Sbjct: 


61 






Sbjct: 


121 




185 


Sbjct: 


181 




245 


Sbjct: 


241 


Query: 


305 


Sb j ct : 


301 




365 


Sbjct: 





65 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 
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Example 1409 

A DNA sequence (GBSxl494) was identified in S.agalactiae <SEQ ID 4323> which encodes the amino 
acid sequence <SEQ ID 4324>. Analysis of this protein sequence reveals the following: 

Possible site: 49 

Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .2096 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside — Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAA23756 GB:AB009314 proton-translocating ATPase, epsiron 
subunit [Streptococcus bovis] 
Identities = 102/138 (73%) , Positives = 121/138 (86%) , Gaps = 1/138 (0%) 

Query: 1 MAQLWQVVTPDGIRYDHHASLITWTPDG3MGILPGHINLIAPLNVHQMKINRSHQEG- 59 

M +TVQWTPDGIRYDHHA+ I+V+TPDGEMGILP HINLIAPL VH+MKI+R+ 
Sbjct: 1 MTFMTVQVVTPDGIRYDHHANFISVKTPDGEMGILPEHINLIAPIjTVHEMKIHRTDDPNH 60 

Query: 60 VDWAWGGIlEVbEDQVTIV7ADSAERARDIDLNRAERAKERAERALEKAQTTQNIDEMR 119 

VDWVA+NGGIIE+ ++ VTIVADSAER RDID++RAERAK RAER LE+AQ+T +IDE+R 
Sbjct: 61 VDWAINGGIIEIKDNLVTIVADSAERERDIDVSRAERAKIRAERKLEQAQSTHDIDEVR 120 

Query: 120 RAEVALRRAINRISVGKK 137 

RA+VALRRA+NRI SVG K 
Sbjct: 121 RAQVALRRALNRISVGNK 138 

A related DNA sequence was identified in S.pyogenes <SEQ ID 4325> which encodes the amino acid 
sequence <SEQ ID 4326>. Analysis of this protein sequence reveals the following: 

N- terminal signal sequence. 

Final Results 

bacterial cytoplasm Certalnty=0 . 2539 (Affirmative) < suco 

bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 100/138 (72%) , Positives = 119/138 (85%) , Gaps = 1/138 (0%) 

Query: 1 MAQLWQVVTPDGIRYDHHASLITVRTPDGEMGILPGHINLIAPLNVHQMKINRSHQ-EG 59 

M Q+TVQWTPDGI+YDHHA I+V TPDGEMGILP HINLIAPL VH+MKI R + E 
Sbjct: 1 MTQMTVQWTPDGIKYDHHAKFISVTTPDGEMGILPNHINLIAPLQVHEMKIRRGGEDEK 60 

Query: 60 TOWAWGGIIEWEDQOTIVADSAERARDIDIjmERAKERAERALEKAQTTQNIDEMR 119 

VDW+A+NGGIIE+ ++ VTIVADSAER RDID++RAERAK RAER + +A+TT NIDE+R 
Sbjct: 61 VDWIAINGGIIEIKDNVVTIVADSAERDRDIDVSRAERAKLRAEREIAQAETTHNIDEVR 120 

Query: 120 RAEVALRRAINRISVGKK 137 

RA+VALRRA+NRI +V KK 
Sbjct: 121 RAKVALRRALNRINVSKK 138 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 
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Example 1410 

A DNA sequence (GBSxl495) was identified in S.agalactiae <SEQ ID 4327> which encodes the amino 

acid sequence <SEQ ID 4328>. Analysis of this protein sequence reveals the following: 

Possible site: 60 
5 >>> Seems to have an uncleavable N-term signal seq 

Final Results 

bacterial membrane Cextainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

10 bacterial cytoplasm --- Certainty=Q . 0000 (Not Clear) < suco 



The protein is similar to the beta submit of the S. mutatis ATPase: 



Query: 1 MSSGKIAQWGPVVDWFASGDKLPEINNALIVYKNGDKSQKVVLEVALELGDGLVRTIA SO 

MS+GKIAQWGPWDV FA+ DKLPEINNAL+VyK+GDKSQ++VLEVALELGDGbVRTIA 
Sbjct: 1 MSTGKIAQWGPWDVAFATDDKLPEINNAI.WYKDGDKSQRIVLEVALELGDGLVRTIA 60 

Query: 61 MESTDGLTRGLEVLDTGRAISVPVGKDTLGRVFNVLGDAIDLEEPFAEDAERQPIHKKAP 120 

MESTDGLTRGLEV DTGRAISVPVGK+TLGRVFNVLGD IDL++PFAEDAERQPIHKKAP 
Sbjct: 61 MESTDGLTRGLEVFDTGRAISVPVGKETLGRVFNVLGDTIDLDKPFAEDAERQPIHKKAP 120 

Query: 121 SFDELSTSSEILETGIKVIDLrAPYLKGGKVGLFGGAGVGKTVLIQELIHNIAQEHGGIB ISO 

SFD+LSTS+EILETGIKVIDLIiAPYLKGGKVGLFGGAGVGKTVLIQELIHNIAQEHGGIS 
Sbjct: 121 SFDDLSTSTEILETGIKVIDLIAPYLKGGKVGLFGGAGVGKTVLIQELIHNIAQEHGGIS 180 

Query: 181 VFTGVGERTREGOTLYTOMKESGVIEKTAWFGQ^PPGARmVALTGLTIAEYFRDVE 240 

VFTGVGERTREGM)LYfrtEMKESGVIEKTAMVFGQMNEPPGARMRVALTGLTIAEYFRDVE 
Sbjct: 181 VFTGVGERTREGNDLYWEMKESGVIEKTAMVFGQMNEPPGARMRVALTGLTIAEYFRDVE 240 

Query: 241 GQDVLLFIDNIFRFTQAGSEVSALLGRMPSAVGYQPTLATEMGQLQER1TSTKKGSVTSI 300 

GQDVLLFIDNIFRETQAGSEVSALLGRMPSAVGYQPTLATEMGQLQERITSTKKGSVTSI 
Sbjct: 241 GQDVLLFIDNIFRFTQAGSEVSALLGRMPSAVGYQPTLATEMGQLQERITSTKKGSVTSI 300 

Query: 301 QAI YVPADDYTDPAPATAFAHLDST1NLERKLTQMGI YPAVDPLAS SSRALTPE I VGDEH 3 60 

QAIYVPADDYTDPAPATAFAHLDSTrNLER+LTQMGIYPAVDPLASSSRAL+PEIVG EH 
Sbjct: 301 QAIYVPADDYTDPAPATAFAHLDSTXNIiERRLTQMGIYPAVDPLASSSRAIiSPEIVGQEH 350 

Query: 361 YEVATEVQRVLQRYRELQDIIAILGMDELSDEEKTLVGRARRIQFFLSQNFNVAETFTGQ 420 

Y+VATEVQ VLQRYRELQDIIAILGMDELSD3EKTLVGRARRIQFFLSQNFNVAE FTGQ 
Sbjct: 361 YDVATEVQHVLQRYRELQDIIAILGMDELSDEEKTLVGRARRIQFFLSQNFNVAEQFTGQ 420 

Query. 421 PGSYVPVEETVRGFKEILDGKHDQIPEDAFRIWGGIEDVIAKAEKM 466 

EGSYVPV ETVRGFKEIL+GK+D++PEDAFR VG 1EDV+ KA+KM 
Sbjct: 421 PGSYVPVAETVRGFKEILEGKYDELPEDAFRSVGAIEDWEKAKKM 466 



A related DNA sequence was identified in S.pyogenes <SEQ ID 4329> which encodes the amino acid 

sequence <SEQ ID 4330>. Analysis of this protein sequence reveals the following: 

50 Possible site: 60 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 0275 (Affirmative) < suco 

55 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 440/468 (94%) , Positives = 456/468 (97%) 

60 

Query: 1 MSSGKIAQWGPVWDWFASGDKTjPEINI^IVYKNGDKSQKVvLEvALELGDGLVRTIA 60 
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MSSGKIAQWGPWDV+ FASGDKLPEINNAI1IVYR+ DK QK+VLEVALELGDG+VRTIA 
Sbjct: 1 MSSGKIAQWGP\A'DVMFaSGDKIjPEIMI^IVYKDSDKKQKIVLEVALELGDGMVRTIA SO 

Query: 61 MESTDGLTRGLEVLDTGRAISVPVGKDTLGRVENVLGDAIDLEEPFAEDAERQPIHKKAP 120 

MESTDGI/TRGIiEVLDTGRAI SVPVGK+TLGRVENVLG+ IDLEEPFAED +RQPIHKKAP 
Sbjct: 61 MESTDGLTRGLEVLDTGRAISVPVGKETLGRVSWLGETIDLEEPFAEDVDRQPIHKKAP 12Q 

Query: 121 SFDELSTSSEILETGIKVIDLLAPYLKGC-IO/GLFGGAGVGKmjIQELIHNIAQEHGGIS 180 

SFDELSTS SEILETG I KVIDLLAP YLKGGKVGIiFGGAGVGKTVLI QELIHNIAQEHGGI S 
Sbjct: 121 SFDELSTSSEILETGIKVIDLLAPyLKGGKVGLFGGAGVGKTVLIOELIHNIAQEHGGIS 180 

Query: 181 VFTGVGERTREGJTOLYWEMKESGVIEKTAMVFGQMNEPPGARMRVALTGLTIAEYFRDVE 240 

VFTGVGERTREGMDLYWEMKESGVIEKTAWFGQI/SNEPeGASMRVALTGLTIAEYFRDVE 
Sbjct: 181 VFTGVGERTREGNDLYWEMKESGVIEKTAMVFGQMNEPPGARMRVALTGLTIAEYFRDVE 240 

Query: 241 GQDVLLFIDNIFRFTQAGSEVSALLGRMPSAVGYQPTLATEMGQLQERITSTKKGSVTSI 3 00 

GQDVLLFIDNIFRFTQAGSEVSAI.LGRMPSAVGYQPTLATEMGQLQERITST+KGSVTSI 
Sbjct: 241 GQDVLLFIDNIFRFTQAGSEVSALLGRt-IPSAVGYQPTLATEMGQLQERITSTQKGSVTSI 300 

Query: 301 QAIYVPADDYTDPAPATAFAHLDSTTNLERKLTQMGIYPAVDPLASSSRALTPEIVGDEH 360 

QAIYVPADDYTDPAPATAFAHLDSTTMLERKLTQMGIYPAVDPLASSSRAL+PEIVG+EH 
Sbjct: 301 QAIYVPADDYTDPAPATAFAHLDSTTNLERKLTQMGIYPAVDPLASSSRALSPEIVGEEH 360 

Query: 361 YEVATEVQRVLQRYREliQDIIAILGMDELSDEEKTLVGRARRIQFFLSQMFNVAETFIGQ 420 

Y VATEVQRVLQJiYRELQDIIAILGMDELSDEEKTLVGRARRIQFFLSQNFHVAE FTG 
Sbjct: 361 YAVATEVQRVLQRYRELQDIIAILGMDELSDEEKTLVGRARRrQFFLSQNFNVAEQFTGL 420 

Query: 421 PGSWPVEETVRGFKEILDGKHDQIPEDAFRMVGGIEDVIAKAEKMNY 468 

PGSYVPV +TVRGFKEIL+GK+D++PEDAFR VG IEDVI KAEKM + 
Sbjct: 421 PGSWPVADTTOGFKEILEGKOTELPEDAFRSVGPIEDVIKKAEKMGF 468 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1411 

A DNA sequence (GBSxl496) was identified in S.agalactiae <SEQ ID 4331> which encodes the amino 
acid sequence <SEQ ID 4332>. Analysis of this protein sequence reveals the following: 

Possible site: 31 

»> Seems to have no N-tenr.inal signal sequence 



Final Results - 

bacterial cytoplasm Certainty=0. 1889 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAA23754 GB:AB009314 proton-translocating ATPase, gamma subunit 
[Streptococcus bovis] 
Identities = 252/293 (86%), Positives = 278/293 (94%), Gaps = 2/293 (0%) 

Query: 1 ^GSJ^EIKDKILSTEKTSKITSAMQWSSAKLVKSEQAARDFQWASKIRQITTNLLKS 60 

MAGSLSEIK KI+ST+KTS IT AMQKVS+AKL KSEOAA+DFQVYASKIRQITT+LLKS 
Sbjct: 1 MAGSLSEIKGKIISTQKTSHITGAMQKVSAAKLTKSEQAAKDFQVYASKIRQ1TTDLLKS 60 

Query: 61 DLVSGSDNPMLSSRPVKRrGYIVITSDKGLVGGYNSKILKAMMDTITDYHTENDDYAIIS 120 

+LV+GS NPML++RPVKKTGYIVITSDKGLVGGYNSKIL[CAMMD I +YH ++ +YAII+ 
Sbjct: 61 ELWGSKNPMLAARPVKRTGYIVITSDKGLVGGYKSKILKAMIffiLlEEYH-QDGNYAIIA 119 

Query: 121 IGSVGSDFFKARGMNVSFEIiRGLEDQPSFTOVGKIIAQAVE^KI^FDELWC^HVK 180 

1G +G+DFFKARGMNV FELRGLEDQPSF+QVG IIA++VEMYKNEI<FDELWCYNHHVN 
Sbjct: 120 IGGIGADFFKARGMNWFELRGLEDQPSFEQVGi^IIAKSVE^KMELFDELWCYNHHVN 179 

Query: 181 SLTSQVRMQQMLPIKELDAEEASEDRVITGFELSPIJREVILEQLLPQYTESLIYGAIIDA 240 
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SLTSQVR4QQMLPI ELDA+EA+E+ V +GFELEP.WE+IL2QLLPQYTESLIYGAI+DA 
Sbjct: 180 SLTSQVRVQQMLPIAELDADEAAEEGV-S3FELEPNREMILEQLLPQYTESLIYGAIVDA 238 

Query: 241 KTAEHAAGMTAMQTATDNAKNVINDLTIQYNRARQAAITQEITEIVAGANALE 293 

KTAEHAAGMTAMQTATDNAICNVINDLTIQYNRARQAAITQEITEIVAGANALE 
Sbjct: 239 KTAEHAAGMTAMQTATDNAKNVINDLTIQYNEARQAAITQEITEIVAGANALE 291 

A related DNA sequence was identified in S.pyogenes <SEQ ID 4333> which encodes the amino acid 
sequence <SEQ ID 4334>. Analysis of this protein sequence reveals the following: 

Possible site: 31 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .1969 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 .0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 251/293 (85%), Positives = 275/293 (93%), Gaps = 2/293 (0%) 

Query: 1 MAGSLSEIKDKILSTEKTSKITSAMQMVSSAKLVKSEQAARDFQVYASKIRQITTNLLKS 6 0 

MAGSLSEIK KI+STEKTSKITSAM+MVSSAKLVKSEQAARDFQ+YASKIRQITT+LLKS 
Sbjct: 1 MAGSLSEIKAKIISTEKTSKITSAMRMVSSAKLWSEQAARDFQIYASKIRQITTDLLKS 6 0 

Query: 61 DLVSGSDNPMLSSRPVKKTGYIVITSDKGLVGGYNSKILKAM^DTITDYHTENDDYAIIS 120 

+L GSDNPML SRPVKKTGYIVITSDKGLVGGYNSKILK++MD IT+YH + DY IIS 
Sbjct: 61 ELTIGSDNPMLVSRPVKKTGYIVITSDKGLVGGYNSKILKSVMDMITEYHADG-DYEIIS 119 

Query: 121 IGSVGSDFFKARGMNVSFELRGLEDQPSFDQVGKIIAQAVEMYKNELFDELYVCYNHHVN 180 

IGSVGSDFFKARGMNV+FELRGL DQPSF+QV +II+Q+V+M+ NE+ FDELYVC YNHHVN 
Sbjct: 120 IGSVGSDFFKARGMNVAFELRGIjADQPSFEQTOQIISQSvDMFVNEIFDELYVCYNHHvN 179 



^ SLTSQVR+QQMLPI +L A+EA+E+ V TGFELEPNR IL+QLLPQ+TESLIYGAIIDA 

Sbjct: 180 SLTSQVRVQQMLPISDLVADEAAEEGV-TGFELEPNRHDILDQLLPQFTESLIYGAIIDA 238 

Query: 241 KTAEHA&GMTAMQTATDNAKNVI NDLT I QYNRARQAAI TQE I TE I VAGANALE 293 

KTAEHAAGMTAMQTATDNAKNVINDLTIQYNRARQAAITQEITEIVAGANALE 
Sbjct: 239 KTAEHAAGMTAMQTATDNAKNVINDLT1QYNRARQAAITQEITEIVAGANALE 291 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1412 

A DNA sequence (GBSxl497) was identified in S.agalactiae <SEQ ID 4335> which encodes the amino 
acid sequence <SEQ ID 4336>. Analysis of this protein sequence reveals the following: 

Possible site: 25 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1963 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 
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Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1413 

A DNA sequence (GBSxl498) was identified in S.agalactiae <SEQ ID 4337> which encodes the amino 
5 acid sequence <SEQ ID 433 8>. Analysis of this protein sequence reveals the following: 
Possible site: 61 

>» Seems to have no N-terminal signal sequence 

Final Results 

10 bacterial cytoplasm — - Certainty=0 .3146 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside — - Certainty=0 . 0000 (Not Clear) < suco 

The protein is similar to the alpha subunit of the proton-translocating ATPase from S.bovis: 

15 >GP:BAA23753 GB:AB009314 proton-translocating ATPase, alpha subunit 

[Streptococcus bovis] Length = 501 

Identities = 482/501 (96%) , Positives = 497/501 (98%) 

Query: 1 MAINAQEISALIKKQIEDFQPNFDVTETGIVTYIGDGIARARGLDNAMSGELLEFSNGAY 60 
20 MAINAQEISALIKKQIE+FQPNFDVTETG+VTYIGDGIARARGLDNAMSGELLEFSNGA+ 

Sbjct: 1 MAINAQEISALIKKQIENFQPNFDVTETGWTyiGDGIARARGLDNAMSGELLEFSNGAF 60 

Query. 61 GMAQNLESNDVGIIILGDFSEIREGDWKRTGKIMEVPVGEAMIGRWNPLGQPVDGLGE 120 
GMAQNLESND VGI 1 1 LGDFS IREGD VKRTGKIMEVPVGEA+IGRWNPLGQPVDGLG+ 
25 Sbjct: 61 GMAQNLESNDVGIIILGDFSTIREGDEVKRTGK1MEVPVGEALIGRW1IPLGQPVDGLGD 120 

Query: 121 IETTATRPVETPAPGVMQRKSVFEPLQTGLKAIDALVPIGRGQRELIIGDRQTGKTSVAI 180 

I+TTATRPVETPAPGVMQRKSV EPLQTGLKAIDALVPIGRGQRELI IGDRQTGKTSVAI 
Sbjct: 121 I KTTATRPVETPAPGVMQRKSVSEPLQTGLKAIDALVPIGRGQRELI IGDRQTGKTSVAI 180 

30 

Query: 181 DAILNQKGQDMICIWAIGQKESTVRTQVETLRKYGALDyTIVVTASASQPSPLLFIAPY 240 

DAILNQKGQDMICIYVAIGQKESTVRTQVETLRKYGAIiDYTIVVTASASQPSPLL+IAPY 
Sbjct: 181 DAILNQKGQDMICIYVAIGQKESTVRTQVETLRKYGALDYTIWTASASQPSPLLYIAPY 240 

35 Query: 241 AGVAMAEEFMYNGKHVLIVYDDLSKQAVAYRELSLLLRRPPGREAYPGDVFYLHSRLLER 300 

AGVAMAEEFMYNGKHVLIVYDDLSKQAVAYREIiSLLLRRPPGREAYPGDVFYLHSRLLER 
Sbjct: 241 AGVAMAEEFMYNGKHVLIVYDDLSKQAVAYREbSLLLRRPPGREAYPGDVFYLHSRIjLER 300 

Query: 301 SAKVSDALGGGSITALPFIETQAGDISAYIATNVISITDGQIFLQENLFNSGIRPAIDAG 360 
40 SAKVSDALGGGSITALPFIETQAGDISAYIATNVISITDGQIFLQENLFNSGIRPAIDAG 

Sbjct: 301 SAKVSDALGGGSITALPFIETQAGDISAYIATNVISITDGQIFLQENLFNSGIRPAIDAG 3 60 

Query: 361 SSVSRVGGAAQIKAMKRVAGTLRLDLiASYRELEAFTQFGSDLDAATQAMjNRGRRTVEVL 420 
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bacterial cytoplasm Certainty=0 . 3654 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 477/501 (95%) , Positives = 490/501 (97%) 

MAINAQEISALIKKQIEDFQPNFDVTETGIVTYIGDGIARARGLDNAMSGELLEFSNGA.Y 60 
+AINAQEISALIKKQIE+FQPNFDVTETGIVTYIGDGIARARGLDKAMSGELLEF NGAY 
LAINAQEISALIKKQIENFQPNFDVTETGIVTYIGDGIARARGLDNAMSGELLEFENGAY 60 



IETT RPVETPAPGVMQRKSV EPLQTGLKAIDALVPIGRGQRELIIGDRQTGKTSVAI 



DAILNQKGQDMICIYVAIGQKESTVRTQTOTLR+YGALDYTIWTASflSQPSPLLFIAPY 



AGVAMAEEFMY GKHVLIVYDDLSKOAVAYRELSLLLRRPPGREAYPGDVFYLHSRLLER 























Sb;,Ct: 












Query: 




Sbjct: 


241 




301 


Sbjct: 


301 


Query: 


361 


Sbjct: 


361 


Query: 


421 


Sbj ct : 


421 




481 


Sbjct: 


481 



KQPDHKPLPVEKQWILYALTHGFLDDVPV+DIIAFEEALYDYFD HY++LFETIRTTKD 



LPEEA LDAAI+AFK+ £ 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1414 

A DNA sequence (GBSxl499) was identified in S.agalactiae <SEQ ID 4341> which encodes the amino 
acid sequence <SEQ ID 4342>. Analysis of this protein sequence reveals the following: 

> N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 189 6 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty= 0.0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:EAA23752 GB:AB009314 proton-translocating ATPase, delta subunit 
[Streptococcus bovis] 
Identities = 98/178 (55%) , Positives = 127/178 (71%) 
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Query: 1 MNKKTQALIEQYSKSLVEVAIEHKIVEKIQQEVAALIDIFETSELEGVLSSLAVSHDEKQ 60 

M+KKTQAL+EQY+KSLVE+AIE + ++Q E AL+ +FE + L LSSL VS DEK 
Sbjct: 1 MDKKTQALVEQYAKSLVEIAIEiaSLAELQSETEALLSVFEETlILADFLSSLVVSRDEICV' 60 

Query: 61 HFVKTLQTSCSTYLVNFLEVIVQNEREALLYPILKSVDQELIKVNGQYPIQITTAVALSP 120 

V+ LQ S S Y+ NFLEVI +QNEREA L IL+ V ++ 4 Q+ I +TTAVAL+ 
Sbjct: 61 KLWLLQESSSVYMNNFLEVILQNEREAFLKAILEGVQKDFVIATNQHDIVVTTAVALTD 120 

Query: 121 EQKERLFDIAKTKIALPNGQLVEHIDPSIVGGFVVNAI^KVIDASVRNQLHQFKMiajK 178 

EQKER+ + K + G+LVE+ID SI+GGFV+N NNKVID S+R QL +FKM LK 
Sbjct: 121 EQKERIIALVAEKEGVKAGKLVENIDES1LGGFVINVNNKV1DTSIRRQLQEFKMNLK 178 

A related DNA sequence was identified in S.pyogenes <SEQ ID 4343> which encodes the amino acid 
sequence <SEQ ID 4344>. Analysis of this protein sequence reveals the following: 

) N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1668 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 86/178 (48%) , Positives = 125/178 (69%) 

Query: 1 MNKKTQALIEQYSKSLVEVAIEHKIVEKIQQEVAALIDIFETSELEGVLSSLAVSHDEKQ 60 

M KK QALIEQY+KSLVEVA EH ++ +Q +V A+++ F T4- L+ LSS AV H EK 
Sbjct: 1 MTKKEQALIEQYAKSLVEVASEHHSLDALQADVIAlLETFTOraLDQSLSSQAVPHAEKI 60 

Query: 61 HFVKTLQTSCSTYLVNFLEVIVQNERBALLYPILKSVDQELIKVNGQYPIQITTAVALSP 120 

+ L+ + S Y+ NFL +I+QNEREA LY +L++V E+ V+ QY + +T+++ L+ 
Sbjct: 61 KLLTLLKGNNSVYI^FI^ILQNEREAYLYQMLQAVIMIAIVSNQYDVTVTSSLPLTE 120 

Query: 121 EQKERLFDIAKTKIALPNGQLVEHIDPSIVGGFvVNANNICVIDASVRNQLHQFKMKLK 178 

EQK R+ + K A+ G+li+E +DPS++GGF+++ NNKVID S+R QL FKM LK 
Sbjct: 121 EQKSRVRAWAKKFAOTAGRLIEKVDPSLIGGFIISVNNKVIDTSIRRQLCAFKMNLK 178 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1415 

A DNA sequence (GBSxl500) was identified in S.agalactiae <SEQ ID 4345> which encodes the amino 
acid sequence <SEQ ID 4346>. This protein is predicted to be ATP synthase b chain (atpF). Analysis of 
this protein sequence reveals the following: 



■ Final Results 

bacterial outside Certainty=0 .3000 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database. 

?GP:AAD13379 GB:U31170 ATPase, b subunit [Streptococcus mutans] 
Identities = 103/165 (62%) , Positives = 130/165 (78%) 

Query: 1 MSILINSTTIGDIIIVSGSVLLLFILIKTFAWKQITGIFEAREQKIANDIDTAEQARQQA 60 

MS LIN T++G+++IV+GS +LL +L+K FAW Q+ IF+ RE+KIA DID AE +RQ A 
Sbjct: 1 MSTLINGTSLGNLLIVTGSFILLLLLVKKFAWSQLAA.IFKTREEKIAKDIDDAENSRQNA 60 
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Query: 61 EAFATKREEELSNAKTEMSTQIIDNAKETGLAKGDQIISEAKTEADRLKEKAHQDIAQNKA 120 

+ KR+ EL+ AK EA QIIDNAKETG A+ +II+EA EA RLK+KA+QDIA +KA 
Sbjct: 61 QVLENKRQVELNQAKDEAAQIIDNAKETGKAQESKIITEAHEEAGRLKDKANQDIATSKA 120 

Query: 121 EArADVKGEVADLTVLLAEKI^IVSNLDKEAQSNLIDSYIKKLGDA 165 

EAL+ VK +VADL+VLLAEKIM NIiDK AQ +LIDSY+ KLGDA 
Sbjct: 121 EALSSVKADVADLSVLLAEKIMRKNLDKTAQGDLIDSYLDKLGDA 165 

A related DNA sequence was identified in S.pyogenes <SEQ ID 4347> which encodes the amino acid 
sequence <SEQ ID 4348>. Analysis of this protein sequence reveals the following: 

I- term signal seq. 

15 Final Results 

bacterial outside Certainty=0 . 3000 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Hot Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

20 The protein has homology with the following sequences in the databases: 



Query: 6 GELVGNFILVTGSVIVLLLLIKKFAWGAIESILQTRSQQISRDIDQAEQSRLSAQQLEAK 65 

G +GN ++VTGS I+LLLL+KKFAW + +1 +TR ++I++DID AE SR +AQ LE K 
Sbjct: 7 GTSLGNLLIVTGSFILLLLLVKKFAWSQLAAIFKTRE3KIAKDIDDAENSRQNAQVLENK 66 

Sb j Ct : 

Query: 126 KTEMSDLTVLLAEKIMGANLDKTAQSQLIDSYLDDLGEA 164 

K +++DL+VLLAEKIM NLDKTAQ LIDSYLD LG+A 
Sbjct: 127 KADVADLSArtjLAEKIMAKNLDKTAQGDLIDSYLDKLGDA 165 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 81/156 (51%) , Positives = 115/156 (72%) 

Query: 10 IGDIIIVSGSVLLLFILIKTFAWKQITGIFEAREQKIANDIDTAEQARQQAEAFATKREE 69 

+G+ I+V+GSV++L +LIK FAW I I + R Q+I+ DID AEQ+R A+ K + 
Sbjct: 9 VGNFILvTGSVIVLLLLIKKFAWGAIESILQTRSQQISRDIDQAEQSRLSAQQLEAKSQA 68 

Query: 70 ELSNAKTFANQIIDNAKETGIAKGDQIISEAKTEADRLKEKAHQDIAQNKAEALADVKGE 129 

L ++ +A++II +AKE G +GD++++EA EA RLKEKA DI Q+K++A++ VK E 
Sbjct: 69 NLDASRLQASKIISDAKEIGQLQGDKLVAEATDEflKRLKEKALTDIEQSKSDAISAVKTE 128 

Query: 130 VADLTVLIAEKIMVSNLDKEAQSNLIDSYIKKLGDA 165 

++DLTVLLAEKIM +NLDK AQS LIDSY+ LG+A 
Sbjct: 129 MSDLTVLLAEKIMGAHLDKTAQSQLIDSYLDDLGEA 164 

SEQ ID 4346 (GBS 169) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 34 (lane 6; MW 18kDa). 

The GBS169-His fusion product was purified (Figure 200, lane 11) and used to immunise mice. The 
resulting antiserum was used for Western blot (Figure 250). These tests confirm that the protein is 
immunoaccessible on GBS bacteria. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 
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Example 1416 

A DNA sequence (GBSxl501) was identified in S.agalactiae <SEQ ID 4349> which encodes the amino 
acid sequence <SEQ ID 4350>. Analysis of this protein sequence reveals the following: 



Possible site: 29 

»> Seems to have no N-terminal signal sequence 
INTEGRAL Likelihood =-11.73 Transmembrane 



Likelihood = - 
Likelihood = - 
Likelihood = - 
Likelihood = - 



Transmembrane 207 - 

Transmembrane 78 - 

Transmembrane 113 - 

Transmembrane 174 - 



- Certainty=0. 5692 (Affirmative) ■ 

- Certainty=0. 0000 (Not Clear) < s 

- Certainty=0. 0000 (Not Clear) < i 



The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAA23750 GB:AB009314 proton-translocating ATPase, a subunit 
[Streptococcus bovis] 
Identities = 149/238 (62%) , Positives = 180/238 (75%) 

Query: 1 MESTSNPTUSFLGIDFDLTILAMSLLTITIIFILVFWASRKMTIKPKGKQNVLEYVVELV 60 
' ' ME++ NPT GI+FDLTILAMSLLT+ I F ++FWA+RKMT+KPKGKQN +EYVYE V 

Sbjct: 1 METSVNPTAHVFGIEFDLTILAMSLLTVIISFGIIFWATRKMTLKPKGKQNFIEYVYEFV 60 

Query: 61 NNTISQNLGHYTKNYSLLMFILFSFVFIANNI/SIjMTSLKTHEHNFWTSPTANFGVDlTLS 120 

NTI NLG YT YSLLMF F F+ IANNLGL+ L++ ++NFWTSPT+ TO T S 
Sbjct: 61 QNTIKPISnLGEYTPKYSLLMFTFFFFILIANNLGLLVKLESEDVNFWTSPTSTIMVDCTWS 120 

Query: 121 LLVAFICHIEGIRKKGIGGYLKGFLSPTPAMLP^llEEVTNVASLALRLFGNIFSGEW 180 

L+VA + H+EG+RKKG+ YLKG+LSP P MLPMN+LE+ TNV SLALRLFGNI++GEW 
Sbjct: 121 LIVAIVVHVEGWKKGVKAYLKGYLSPFPI^IMLPKNILEQFTNVLSLALRLFGNIYAGEVV 180 

Query: 181 TGLLLQLAVLSPFTGPLAFALNIWJTAFSMFIGFIQAYVFIILSSSYIGHKVHGDEEE 238 

T L++ S PA ALN+ W AFS FIG IQAYVF ILSS YI K+ DE+E 

Sbjct: 181 TALIVGFGTKSLIFAPFALALNLAWVAFSAFIGCIQAYVFTILSSKYISEKLPEDEDE 238 



A related DNA sequence was identified in S.pyogenes <SEQ ID 435 1> which encodes the £ 
sequence <SEQ ID 4352>. Analysis of this protein sequence reveals the following: 



Possible site: 33 

>» Seems to have a cleavable N-ter 
INTEGRAL Likelihood = -4.73 
INTEGRAL Likelihood = -4.35 
INTEGRAL Likelihood = -2.13 

Final Results 

bacterial membrane - 
bacterial outside - 
bacterial cytoplasm - 



Transmembrane 



! - 95 ( 72 - 97) 
; - 131 ( 112 - 132) 
) - 216 ( 197 - 216) 



• Certainty=0. 2890 (Affirmative) ■ 
Certainty=0. 0000 (Not Clear) < : 
Certainty=0. 0000 (Not Clear) < i 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 124/239 (51%), Positives = 169/239 (59%), Gaps = 3/239 (1%) 



Sbjct: 



MESTSNPTVSFLGIDFDLTILAMSLLTITIIFILVFWASRKMIIKPKGKQNVLEYVYELV 60 
ME P + I F+LT+LA+ ++TI I+F VFWASR+M +KP+GKQ LEY+ V 
MEEAKIPMLKLGPITFNLTLIAVCIVTIAIVFAFVFWASRQMKLKPEGKQTALEYLISFV 60 



Query: 61 NNTISQNLGH-YTKNYSLLMFILFSFVFIANNLGLMTSLKT-HEHNFWTSPTANFGvDIT 118 

+ ++L H K+YSLL+F +F FV +ANNLGL T L+T + 4N WTSPTAN D+ 
Sbjct: 61 DGIGEEHLDH^QKSYSLLLFTIFLFmVANNMLFTKLETWGYNLWTSPTANLAFDLA 120 
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119 LSLLVAFICHIEGIRKKGIGGYLKGFLSPTPAMLPMNLLEEVTOTASLALRLFGNIFSGE 178 

LSL + + HIEG+R++G+ +LK +P P M PMNLLEE TN SLA4RLFGNIF+GE 
121 LSLFITLMVHIEGVRRRGLVAHLXRLATPWP-MTPMNLLE3FTKFLS1AIRLFGNIFAGE 179 



Query: 179 WTGLLLQLAVLSPFTGPLAFALNI\ri\TAFSMFIG?IQAWFIILSSSYIGHKVHGDEE 237 

WTGL++QLA + P+AF +N+ WTAFS+FI IQA+VF L+++Y+G KV+ EE 
Sbjct: 180 VWGLIVQIAOTRVYWWPIAFLV1#IAKTAFSVFISCIQAF\;FTKLTATYLGKKVNESEE 238 



A related GBS gene <SEQ ID 8803> and protein <SEQ ID 8804> were also identified. Analysis of tl 
10 protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 1 
McG: Discrim Score: -3.50 
GvH: Signal Score (-7.5): -3.36 
Possible site: 29 
15 >>> Seems to have no N-terminal signal sequence 

ALOM program count: 5 value: -11.73 threshold: 0.0 



INTEGRAL 
INTEGRAL 
INTEGRAL 



Likelihood =-11.73 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
PERIPHERAL Likelihood = 
modified ALOM score: 2.85 

*** Reasoning Step: 3 

Final Results 

bacterial membrane - 
bacterial outside - 
bacterial cytoplasm - 



Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 
156 



- Certainty=0. 5692 (Affirmative) <: succs 
• Certainty=0 . 0000 (Not Clear) < suco 

- Certainty=0 . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the databases: 

ORF01818(301 - 1014 of 1314) 

GP|266232l|dbj |BAA23750.l| |AB009314(1 - 238 of 239) proton- translocating ATPase, a subunit 
{Streptococcus bovis} 
%Match =35.0 

%Identity =62.2 %Similarity =78.6 

Matches = 148 Mismatches = 51 Conservative Sub.s = 39 



XANCQTLMLPGVGFIERYFLRSICVYILSKIDDNLEKKEG*GLESTSNPTVSFLGIDFDLTILAMSLLTITIIFILVFWA 
= l = = III =11=111111111111= I I -III 
METSVNPTAHVFGIEFDLTILAMSLLTVI ISFGI I FWA 



SRKMTIKPKGKQNVLEYVYELvJOTISQNLGHYTK^SLLMFILFSFW 

mimiiini =11111 = 1 iii iii ii i him =i i = = iiiini= i = = = = iiiini= ii i 

TRKMTLKPKGKQNFIEYVYEWQNTIKPNLGEYTPKYSLLMFTFFFFILIANNLGLLVKLESEDYNFWTSPTSTIMVDCT 



LSLLVAFICHIEGIRKKGIGGYLKGFLSPTPAKLPI^LLEE'/TK^SIALRLFGNIFSGEVVTGLLLQLAVLSPFTGPLA 
11 = 11 = 1 = 11=1111= 1111 = 111 I llllhll: III llllllllll = = lllll l = = = I 1 = 1 
WSLIVAIVVHVEGWKKGVKAYLKGYLSPFPMKLPMNILEQFTAT/LSLALRLFGN^ 



140 



150 



160 



170 



190 



924 954 984 1014 1044 1074 1104 1134 

FALNIVWTAFSMFIGFIQAYVFIILSSSYIGHKVHGDEEE*EKRGEICQYLLIVQRLVISLSYLALCFSYLS*LRLLHGN 

= in= i iii iii mill mi ii i= im 

IiALNLAWVAFSAFIGCIQAYVFTILSSKYISEKLPEDEDET 



210 



220 



230 
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Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1417 

A DNA sequence (GBSxl502) was identified in S.agalactiae <SEQ ID 4353> which encodes the amino 
5 acid sequence <SEQ ID 4354>. This protein is predicted to be ATP synthase c subunit (atpE). Analysis of 
this protein sequence reveals the following: 

Possible site: 29 

»> Seems to have a cleavable N-term signal seq. 

INTEGRAL Likelihood = -4.62 Transmembrane 48 - 64 { 42 - 65) 

10 

Final Results 

bacterial membrane Certainty=0 . 284S (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

15 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAA23749 GB:AB009314 proton-translocating ATPase, c subunit [Streptococcus bovis] 
Identities = 56/65 (86%) , Positives = 59/65 (90%) 

20 Query: i mnlailalgfavmgvsigegilvaniaksaarqpemfsklqtlmftgvafiegtffvlfa 60 

+NL ILALG AV+GVS+GEGILVANIAKSAARQPEMFSKLQTLMF GVAFIEGTFFVL A 
Sbjct: 2 I^KILALGLAVLGVSLGEGILVANIAKSAARQPEMFSKLQTLMFLGVAFIEGTFFVLLA 61 

Query: 61 FTFLV 65 
25 TF V 

Sbjct: 62 STFFV 66 

A related DNA sequence was identified in S.pyogenes <SEQ ID 4355> which encodes the amino acid 
sequence <SEQ ID 4356>. Analysis of this protein sequence reveals the following: 

30 Possible site: 17 

>>> Seems to have a cleavable N-term signal seq. 

INTEGRAL Likelihood = -5.26 Transmembrane 47 - 63 ( 41 - 64) 

Final Results 

35 bacterial membrane Certainty=0 .3102 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

40 >GP:AAD00920 GB:AF001955 UncE [Streptococcus sanguinis] 

Identities = 50/66 (75%) , Positives = 58/66 (87%) , Gaps = 1/66 (1%) 

Query: 1 MNPIF-AIALACFGVSLAEGFLMANLFKAASRQPEIIGQLRSLMILGVAFIEGTFFVTLV 59 
MN F L ACFGVS+AEG +M+NLFKAASRQPEIIGQLRSL+ILG+AF+EGTFFVTL 
45 Sbjct: 1 MNLTFLGLCFACFGVSIAEGLIMSNLFKAASRQPEIIGQLRSLLILGIAFVEGTFFVTLA 60 

Query: 60 MAFILK 65 

MAF++K 
Sbjct: 61 MAFVIK 66 

50 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 33/62 (53%) , Positives = 45/62 (72%) 

Query: 5 ILALGFAVMGVSIGEGILVANIAKSAARQPEMFSKLQTLMFTGVAFIEGTFFVLFAFTFLVR 66 
55 I AL A GVS+ EG L+AN+ K+A+RQPE+ +L++LM GVAFIEGTFFV F+++ 

Sbjct: 4 IFAIAIACFGVSIiAEGFLMANLFKAASRQPEIIGQLRSLMILGVAFIEGTFFVTLVMAFILK 65 



WO 02/34771 



PCT/GB01/04789 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1418 

A DNA sequence (GBSxl503) was identified in S.agalactiae <SEQ ID 4357> which encodes the amino 
5 acid sequence <SEQ ID 4358>. Analysis of this protein sequence reveals the following: 

Possible site: 47 

>>> Seems to have no N- terminal signal sequence 

Final Results 

10 bacterial cytoplasm Certainty=0. 2562 (Affirmative) < guco 

bacterial membrane --- Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

15 No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1419 

A DNA sequence (GBSxl504) was identified in S.agalactiae <SEQ ID 4359> which encodes the amino 
20 acid sequence <SEQ ID 4360>. This protein is predicted to be bacterial glycogen synthase (glgA). Analysis 
of this protein sequence reveals the following: 
Possible site: 16 

>>> Seems to have no N-terminal signal sequence 

25 Final Results 

bacterial cytoplasm Certainty=0 . 1574 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

30 The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAA19591 GB:D87026 bacterial glycogen synthase [Bacillus 
s tearothermophi lus ] 
Identities = 220/475 (46%), Positives = 312/475 (65%), Gaps = 1/475 (0%) 



Query: 


1 


Sbjct: 




Query: 




Sbjct: 


61 


Query: 


121 


Sbjct: 


121 


Query: 




Sbjct: 


181 


Query: 


241 


Sb j ct : 


241 



r M+PFLL+E+Y Y ++R VFTIHN++FQG F +L DL 



L GI+NGID + NPE D FL +S E K NK ALQ GLP+ +VPLI +V+ 
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Query: 301 RLTDQKGFDIiaSELEJNMLQQDIQMVILGTGYHHFEETFSYFASRYPEKLSANITFDLRL 350 

R+T QKG D++ M+ FE+ FS At TP K+ IF L 

SbjCt: 301 RMTAQKGLDLVTCTFHEMMSEDMQLV\fLGTGDMRFEQFFSQM^AAYPGKVGVYIGFHEPL 360 

5 Query: 361 AQQIYAASDIFMMPSAFEPCGLSQMMAMRYG3LP1.VHEVGGLKDTVVAFNQFDGSGTGFS 420 

A QIYA +D+F++PS FEPC3LSQM+A+RYG++P+V E GGL DTV ++13+ G GFS 
Sbjct: 3 SI AHQIYAGADLFLIPSLFEPCGLSQMIALRYGTIPIVRETGGLMDTVQSYNEITKEGNGFS 420 

Query: 421 FNHFSGYVILMQTLKLALEVYlTOYPEAWKKLQViQAMSKDFSWDTACVAYEQIjYQQL 475 
10 F +F+ + ++ T++ AL Y P W++L +AM D+SW + Y+Q Y+QL 

Sbjct: 421 FTNFNAHDMLYTIRRALSFYRQ-PSVWEQLTERAMRGDYSWRRSANQYKQAYEQL 474 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
1 5 vaccines or diagnostics. 

Example 1420 

A DNA sequence (GBSxl505) was identified in S.agalactiae <SEQ ID 4361> which encodes the amino 
acid sequence <SEQ ID 4362>. This protein is predicted to be a subunit of ADP-glucose 
pyrophosphorylase. Analysis of this protein sequence reveals the following: 

20 Possible site: 14 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm — Certainty=0. 3492 (Affirmative) < suco 
25 bacterial membrane . — Certainty=0. 0000 (Not Clear) < suco 

bacterial outside — Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAA19590 GB:D87026 subunit of ADP-glucose pyrophosphorylase 
30 [Bacillus stearothermophilusj 

Identities = 53/178 (33%), Positives = 111/178 (62%), Gaps = 1/178 (0%) 

Query: 37 SAEIYVIDTPWLIEKMEEAQNHEPRKLRFLLRDLIVESNAIiAFEYTGYLSNISSIJCSYY 96 
S E+Y+++T L++ + + +M+ + ++RD + +EY+GY + I S++ Y+ 

35 Sbjct: 157 SLEMYLLET8LLLDL1ADY-KNHGYYSIVDVIRDYHRSLSICEYEYSGYAAVIDSVEQYF 215 

Query: 97 DAMDMLTPNKFYSLFFSNQKVYTKVKN3EATYFDKQSKVSNSQLASGS 1 1 KGYLDHS IV 156 

++M++L + + LF + +YTKVK+E T + ++ NV S +A+G +I+G +++S++ 
Sbjct: 216 RSSMELIiDRDVWEQLFLPSHPIYTKVKDSPPTKYGREGEVKRSMIANGCTIEGTVENSVIj 275 

40 

Query: 157 SRNCLLEKGTRvWSIIFPtCVKIGEGATIEOTIIDKCVKVASGVTLKGSLDKPLVIPK 214 

R+ + KG V NSII K +IG+G ++ IIDK KV GV LKG+ ++P ++ K 
Sbjct: 276 FRSVKIGKGAVVmSIIMQKCQIGDGCVLDGVIIDKDAKVEPGVVLKGTKEQPFIVRK 333 

45 No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1421 

A DNA sequence (GBSxl506) was identified in S.agalactiae <SEQ ID 4363> which encodes the amino 
50 acid sequence <SEQ ID 4364>. This protein is predicted to be subunit of ADP-glucose pyrophosphorylase 
(glgC-1). Analysis of this protein sequence reveals the following: 
Possible site-. 32 

»> Seems to have an uncleavable N-term signal seq 
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Final Results 

bacterial membrane Certainty=0. OOOO (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

5 bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9765> which encodes amino acid sequence <SEQ ID 9766> 
was also identified. 



The protein has homology with the following sequences in the GENPEPT database. 

10 >GP:BAA19589 GB:D87026 subunit of ADP-glucose pyrophosphorylase 

[Bacillus stearothermophilus] 
Identities = 195/352 (55%) , Positives = 259/352 (73%) 



Query: 7 MKNEMLALIliAGGQGTRLGKLTQSIAKPAVQFGGRYRIIDFALSNCANSGINNVGVITQY 66 

MK + +A++IAGGQG+RL LT +IAKPAV FGG+YRIIDF LSNC NSGI+ VGV+TQY 
Sbjct: 1 MKKKCIAMLLAGGQGSRLRSLTTNIAKPAVPFGGKYRIIDFTLSNCTNSGIDTVGVLTQY 60 

Sbjct: 

Query: 127 vlILSGDHIYI<MOTDDMLQTHI<DNIASLTVAVI J DVPLKEASRFGIMNTDSNDRIVEFEEK 186 

VL+LSGDHIYKM+Y ML H A +T++V++VP +EASRFGIMNT+ IVEF EK 
Sbjct: 121 VXjVLSGDHIYKMDYQHMLDYHIAKQADWISVIEVPWEFASRFGIMNTNEEMEIVEFAEK 180 

Query: 187 PEHPKSTKASMGIYIFDWKRLRTVLIDGEKNGIDMSDFGKNVIPAYIiESGERVYTYNFDG 246 

P PKS ASMGIYIF+W L+ L N DFGK+VIP L +R + Y F+G 

Sbjct: 181 PAEPKSNLASMGIYIFJIWPLLKQYLQIDNANPHSSHDFGKDVIPMLLREKKRPFAYPFEG 240 



Query: 307 CFVAGNVEHS ILSTNVQVKPNAI I KDSFWISGATIGEGAKINRAI IGEDAVI 358 

C V G VE S+L V++ A++K+S +M GA + EGA + RAI+ D++I 
Sbjct: 301 CVVEGTVERSVLFQGVRIGKGAVVKESVIMPGAAVSEGAYVERAIVTPDSII 352 

There is also homology to SEQ ID 2660. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 



Example 1422 

A DNA sequence (GBSxl507) was identified in S.agalactiae <SEQ ID 4365> which encodes the amino 
acid sequence <SEQ ID 4366>. Analysis of this protein sequence reveals the following: 

Possible site: 37 
45 >>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .2844 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

50 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAA78440 GB:Z14057 1 , 4-alpha-glucan branching enzyme [Bacillus 
caldolyticus] 

Identities = 272/616 (44%), Positives = 371/616 (60%), Gaps = 14/616 (2%) 
Query: 6 ELYTFGIGENFHLQNYLGVHSENGSFC 1 
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Sbjct: 10 EWLFHEGRLYQSYELFGAHVIRGGGAVC-TRFCTvAPHJ^EWLVGSFMDWNGTNSPLTK 69 

Query: 62 -WQAGVWEaNSLimREGDLYEfLVTRKGGQVVEKIDRMRVYMERRPGTASVIKVLRNKKW 120 

H GVW 4 EG JjYKY + G+V+ K DP AYE RP TAS44 1,4 +W 

Sbjct: 70 VITOEGVm'IWPENLEGHLYKYEIITPDGRVt.LKaDPYAFYSELRPHTASlVYDLKGYEW 129 

Query: 121 



Query: 181 FMPLMAHPLDMSWGYQLMGyFAFEHTYGTPEEFQDFVEACHIQJNIGVLVDWVPGHFIQHD 240 

+PL+ HPLD SWGYQ GY44 YGTP 4F FV+ CH+ 4GV44DWVPGHF 44 
Sbjct: 190 LLPLVEHPLDRSWGYQGTGYYSVTSRYGTPHDFMYFVDRCHQAGLGVIIDWVPGHFCKDA 243 

Query: 241 DAIAYFDGTATYEYQIfflDRAHISrYRWGALNFDLGKSQVQSFLISSALFWIEHYHIDGIRVD 300 

L FDG TYEY N NY WG NFDLGK +V+SFLIS+ALFW+E+YH+DG RVD 

Sbjct: 2S0 HGLYMFDGAPTYBYAWEKDRE 



Query: 301 AVStMLYLDYDEGPWEANQFGDfflOTLEGYHFLRKIWWIKERHPNVMMIAEESTASTPIT 360 

AV+NMLY 44 +E N FLR+LN+ 4 PNV MIAE+ST +T 

Sbjct: 310 AVANNLYWPNNDRLYE-- -NPYAVEFIJIQMEAVFAYDPNVWMIAEDSTDWPRVT 361 

Query: 361 KDLESGGLGFDFKWMGVMTOIIjRFYEEDPLYRQYDFNLVTFSFOTIFNENFVIAFSHDE 420 

GGLGF+4-KWNMGWMND+L++ E P R+Y W V+FS +Y ++ENF+I. FSHDE 
Sbjct: 362 APTYMGKJFNYlWMGWMMMLKyMETPPHERKYAHNCVSFSIiLYAYSENFILPFSHDE 421 

Query: 421 WIGKKSM^lHK^mGDRYKQFAGIJRNLYAYQMCHPGK^LFMGSEFGQFLEWKYKDQLEV^E 480 

WHGKKS4++KM G +FA hR LY Y M HPGKKLLFMGSEF QF EWK4 ++L+W 
Sbjct: 422 WHGKKSLLNKMPGSYEEKFAQLRLLYGYXMAOT^ 481 

Query: 481 NliTODMNQKMQRYTKQLKQFYKDHKCLWRIDDSFDGLEIIDADHKSETVliSFIRKDDK-G 535 

+ 4444KM Y KQIj YK 4K 4 4D G E ID N 444 SF1R4 K G 
Sbjct: 482 LFDFELHRKI-1DEYVKQLIACYKRYKPFYELDHDPRGFEWIDVHNAEQSIFSFIRRGKKEG 541 



Query: 600 HTLSFTLPALGASVKR 615 

4 4 T4P G S4 R 
Sbjct: 602 YHVRMTIPPFGISILR 617 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1423 

A DNA sequence (GBSxl508) was identified in S.agalactiae <SEQ ID 4367> which encodes the amino 
acid sequence <SEQ ID 4368>. This protein is predicted to be pullulanase (pulA). Analysis of this protein 
sequence reveals the following: 

2 N- terminal signal sequence 



Final Results 

bacterial cytoplasm — Certainty=0 . 3194 (Affirmative) < suco 

bacterial membrane Cert ainty=0 . 0000 (Hot Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC44685 GB:U67061 pullulanase [Bacteroides thetaiotaomicron] 
Identities = 223/597 (37%) , Positives = 331/597 (55%) , Gaps = 55/597 (9%) 

Query: 139 EYSETKTAFRLWAPTAERVELILY ST ET ~' .VLSMKRG m AVNYKNHKENTOGVWFT 198 
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Query: 199 ELEGMYMYQAYTYRVYYRRRTFKITRDPYSIATTANGKRSIVIAPEALTPEGFKISHGKE 258 

+ + + YT+ V + T + A NGKR+ +1 ++ P+G++ + 
Sbjct: 94 WSKDLIGKFYTFOTKIDDKWQGOTPGINARAVGraGKRAAIIDWQSTWPDGWE SD 149 

Query: 259 AKWRLENPNQAVIYEMHTODFSISETSGVKTDYHGKFKGLHQKGTVNQHGDKTTFDYVQD 318 

+ L++P +IYEMH RDFS+ TSGVK GK+ L + GT+N T D++ + 

Sbjct: 150 TRPPLKSPADMIIYEMHHRDFSVDSTSGVKNK--GKYLALTEHGTMNSDKLLTGIDHLIE 207 

Query: 319 LGWYIQLQPIFDHHQTFDDD-GHYAYNWGYDPENYNVPEASFSSNPHEPATRILELKSA 377 

LGV ++ L P FD+ + +YNWGYDP+NYNVP+ S++++P++PATR+ E K 

Sbjct: 208 LGVTHVHLLPSFDYASVDETRIJ^SYNWGYDPQNYNVPDGSYATDPYQPATRVKEFKQM 267 

Query: 378 IQAYHmGIGVIMDVVYNHTFSSTDSAFQLTVPDYYYRMNHWGTFQNGSGCGMETASEKE 437 

+QA H AGI VIMDWYNHTF++ +S F+ TVP Y+YR + T NGSGCGNETASE+ 
Sbjct: 258 VQALHKAGIRVIMDWYNHTFNTDESNFERTVPGYFYRQKEDKTLANGSGCGNETASERL 327 

Query: 438 MCRKYILDSVLYWVKEYNIDGFRFDLMGLHDVETMNIIRNELNKIDPRILVYGEGWDMGA 497 

M RK++++SVLYW+KEY++DGFRFDLMG+HD+ETMN IR +N +DP I +YGEGW A 
Sbjct: 328 MMRKFMVESVLYWIKEYHVDGFRFDLMGIHDIETMNEIRKAYNAVDPTICIYGEGWAAEA 387 

Query: 498 GLTPQNK-AKKDNAYQMPGIGFFNDDVRDAV- --KGAEIYGEFKKGLVSGNSTEDIVAKG 553 

P + A K N Q+PG+ F+D++RD + G+GFG+G EVG 
Sbjct: 388 PQYPADSLAMKGNIAQIPGVAVFSDELRDGLCGPVGDKRKGAFLAGIPGG EMSVKFG 444 

Query: 554 ILGSDE LVSYI DPSQVLNYVEAHDNYNMTOLLWELHPNDNEKQHIYR 600 

I G+ E V+Y P Q+++YV HD L D L P+ +Q I 

Sbjct: 445 IAGAIEHPQVQCDSVNYTQKPWAKCPVQMISWSCHDGLCLVDRLKASMPDITPEQLIRL 504 

Query: 601 VEVASAMNLLMQGMAFMQLGQEFLRTKCYPTGDKGQLTQADKERAMNSYNAPDQTOQVNW 660 

++A A+ QG+ F+ G+E +R DK+ NSY +PD VN ++W 

Sbjct: 505 DKLAQAWFTSQGIPFIYAGEEIMR DKQGVDNSYKSPDAVNAIDW 549 

Query: S61 DNVTFIIKSTINFIRKIITLKTNSPYFSYSSFEEIRKHVFVESAQYHSGFISFTVEEH 717 

T + +++I L+ + P F ++RKH+ + S I+F +++H 

Sbjct: 550 RRKTTSADVFMYYKRLIDLRKSHPAFRMGDAGQVRKHLEFLPVE-GSNLIAFRLKDH 505 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1424 

A DNA sequence (GBSxl509) was identified in S.agalactiae <SEQ ID 4369> which encodes the amino 
acid sequence <SEQ ID 4370>. Analysis of this protein sequence reveals the following: 

N-terminal signal sequence 

Final Results 

bacterial cytoplasm --- Certainty=0 .2368 (Affirmative) < suco 

bacterial membrane — - Certainty=0 . 0000 (Not Clear) c suco 

bacterial outside Certainty=0 .0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP : CAB12492 GB:Z99107 similar to hypothetical proteins [Bacillus subtilis] 
Identities = 151/293 (51%) , Positives = 193/293 (65%) , Gaps = 5/293 (1%) 

Query: 5 KKARLIYNPTSGQEIMKKNVAEVLDILEGFGYETSAFQTTPTKNSARDEATRAAQAGFDL 64 

K+AR+IYNPTSG+EI KK++A+VL E GYETS TT A A AA FDL 

Sbjct: 2 KEARIIYNPTSGREIFKKHI^QvLQKFEQAGYErSTHATT-CAGDATHAAKEAALREFDL 60 
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Sbjct: 61 IIAAGGDGTINEWNGIAPLD^rai^ 120 

Query: 12S ViWDIGQAQEDNYFINIAflAGSLTELTYSVPSQLKTTFGYLAVIAKGVELLPRVRKVPVK 184 

+DIGQ YFINIA G LTELTY VPS+LKT G LAY KG+E+LP +R V+ 

Sbjct: 121 RPIDIGQVN-GQYFINIAGGGRLTELTYDVPSiCLKTMLGQLAYyLKGMEMI.PSLRPTEVE 179 

Query: 185 ITHDKGEFIGDASMIFVAITOaVGGFEQIAPDAKLDDGKrrr.rLViCTANLlEIMHLIRI,V 244 

I +D F G+ + V +TNSVGGFE4+APD+- L+-DG F L+++K ANL E + + + 
Sbjct: 180 IEYDGKLFQGEIMLFLVTLTNSVGGFEKIAPDSSLNDGMFDLMILK1CAWLAEFIRVATM& 239 

Query: 245 ^GGKHllJDKRVEYIKTSYLTIEPLSDERMMIHLDGEYGGCiAPITLANIjKNHI 297 

Ii G+HIND+ + Y K + + + E+M +NLDGEYGG P NL HI 

Sbjct: 240 I.R-GEHIMX^irYTKAWVKVN--VSEKMQLHLDG3YGGMLPGEFVNLYRHI 289 



15 A related DNA sequence was identified in S.pyogenes <SEQ ID 4371> which encodes the amino acid 
sequence <SEQ ID 4372>. Analysis of this protein sequence reveals the following: 

Possible site: 40 

i>> Seems to have no N-terminal signal sequence 

20 Final Results 

bacterial cytoplasm --- Certainty=0.2S01 (Affirmative) < suco 

bacterial membrane --- Certainty=D. 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below. 
Identities - 272/334 (81%) , Positives = 300/334 (89%) 

Query: 1 MKKQK^LIYNPTSGQEIMKKNVAEVLDILEGFGYETSAFQTTPTKNSARDEATRAAQA 60 

MKKQ +ARLIYNPTSGQE+M+K+V EVLDILEGFGYETSAFQTT KSJSA +EA RAA+A 
Sbjct: 1 MKKQLRARLIYNPTSGQEL^KSVPEVLDILF^FGYETSAFQTTAQKNSALNEARRAAKA 60 

Query: 61 GFDLI VAAGGDGTINE WNGIAPLKRRPKMAI I PTGTTNDFARALKI PRGNPI EATKLIG 120 

GFDL++AAGGDGT INE WNGIAPLK+RPKMAI I PTGTTNDFARALK+ PRGNP +A KLIG 
Sbjct: 61 GFDLLIAAGGDGTIIffivVNGIAPLKKRPKMAIIPTGTTlTOFAFJu^KVPRGNPSQAAKIiIG 120 

Query: 121 KNQIVKMDIGQAQSDKYFINIAAAGSLrELTYSVPSQLKTTFGYLAYLAKGVELIjPRVRK 180 

KNQ ++MDIG+A4-+D YFINIAAAGSLT2LTYSVPSQLKT FGYIAYLAKGVELLPRV 
Sbjct: 121 KNQTIQMDIGRAKKDTYFINIAAAGSI.TELTYSVPSQIjICrMFGYIAYLAKGVEI.LPRVSN 180 

Query: 131 VPVKITEIDKGEFIGDASNIFVAIlilSVGGFEQIAPDAKLDDGKFTLILVKTANLIEIMHL 240 

VPVKITHDKG F G SMIF AITNSVGGFE IAPDAKLDDG FTLIL+KTANL EI+HL 
Sbjct: 181 VPVKITHDKGVFEGQVSMIFAAITNSVGGFEMIAPDAKLDDGMFTLILIKTANLFEIVHL 240 

Query: 241 IRLVIAGGKHINDKEVEYIKTSYLTIEPLSDE^MINIjDGEYGGDAPITLANLKNHIRFF 3 00 

+RL+L GGKHI D+RVEYIKTS + IEP +RMMINLDGEYGGDAPITL NLKNHI FF 
Sbjct: 241 LRLILDGGKHITDRRVEYIKTSKIYIEPQCGKRMMIKLDGEYGGDAPITLENLKNHITFF 300 

Query: 301 ANTDEISDDALVLDKDEIiAIEAIAQKFANEVDDIi 334 

A+TD ISDDALVLD+DEL IE I +KFA+EV+DL 
Sbjct: 301 ADTDLISDDALVLDQDELEIEEIVKKFAHEVEDIi .334 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens f 
vaccines or diagnostics. 



Example 1425 

55 A DNA sequence (GBSxl510) was identified in S.agalactiae <SEQ ID 4373> which encodes the amino 
acid sequence <SEQ ID 4374>. This protein is predicted to be DNA ligase (ligA-1). Analysis of this protein 
sequence reveals the following: 

Possible site: 16 

»> Seems to have no N-terminal signal sequence 
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INTEGRAL Likelihood = -0.27 Transmembrane 363 - 379 ( 363 - 379) 

Final Results 

bacterial membrane Certainty=0 . 1107 (Affirmative) < suco 

5 bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9763> which encodes amino acid sequence <SEQ ID 9764> 
was also identified. 

1 0 The protein has homology with the following sequences in the GENPEPT database. 



ENRMNELVSLLNQYAKEYYTQDNPTOSDSQYDQLYRELVELEKQHPENILPNSPTHRVGG 6 1 
+ R EL +N+Y+ EYYT D P+V D++YD+L +EL+ +E++HP+ P+SPT RVGG 
KQRAEELRRTINKYSYEYYTLDEPSVPDAEYDRLKQELIAI3SEHPDLRTPDSPTQRVGG 66 













































299 


Sbjct: 


307 




359 


Sb j ct : 


367 




418 


Sbjct: 


427 




478 


Sb j ct : 


487 


Query: 


538 


Sb j ct : 


547 




594 


Sb j ct : 


607 



P+ SL +AF+ ++L FD+RV+ AY ELKIDGL+VSL Y 



GATRGDG GE+ITENLK + +IPL +++ L I VRGE Y+PK £ 



ANGEQEFANPRNAAAGTLRQLNTGIVAKRKLATFLYQEASPTQK- -ETQDDVLKELESYG 238 

N E+ EANPRNAAAG+LRQL+ I AKR h F+Y A + ETQ L L+ G 
KNEEEPFANPRNAAAGSLRQIJDPKIAAKRNIjDIFVYSIAELDEMGVETQSQGLDFLDELG 246 



QEELGFT K+PRWA 



IAYKFPAEE ++L ++ VGRTGV+TPTA L PV++AGTTVSRA+LHN D I EKDIR 



I D VW KAGDIIP V+NV++ +R +E +P CP CGSELV EGEVALRCINP 



h +LF+ +L+ +VAD+Y+L+ E ++ L+ H 



+ +IQ SKENS E+LLFGLGIR +GSKA++ L F +L L +AS+E -i 



+G +A ++ T+F KEE+ +LL EL VN Y G K +D+ +G T+VLTGKL 



* LG K+TGSVSK TDL++AG AGSKLTKAQ+L I + +E+ L+ 
iy^GGKLTGSVSKNTDLVIAGFJUlGSKIiTKAQEMIEVlMEEQLM 663 

A related DNA sequence was identified in S.pyogems <SEQ ID 4375> which encodes the amino acid 
sequence <SEQ ID 4376>. Analysis of this protein sequence reveals the following: 

Possible site: 61 

»> Seems to have no N-terminal signal sequence 

i3 " 379 { 363 - 379) 
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Final Results 

bacterial membrane Certainty=0. 1171 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm — Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 472/652 (72%), Positives = 556/652 (84%) 

Query: 1 MENRMNELVSLIKQYAKEYYTQDNPTVSDSQYDQLYRELVELEKQHPENILPHSPTHRVG 60 

M+ R+ EL LLN+Y +YYT+D P+VSDS YD+LYRELV LE+ +PE +L +SPT +VG 
Sbjct: 1 MKKRIKELTDLLNRYRYDYYTKDAPSVSDSDYDKLYRELVTLEQSYPEYVLQDSPTQQVG 60 

Query: 61 GLVLEGFEKYQHEYPLYSLQDAFSICEELIAFDIOxVKAEFPTAAYMAELKIDGLSVSLTYV 120 

G +L+GFEKY+H+YPL+SLQDAFS+EEL AFDKRVKAEFP A Y+AELKIDGLS+SL+Y 
Sbjct: 61 GTILKGFEKYRHQYPLFSLQDAFSREELDAFDKRVKAEFPNATYLAELKIDGLSISLSYE 120 

Query: 121 NGVLQVGATRGDGNIGENITENLKRVHDIPLHLDQSLDITVRGECYLPKESFEAINIEKR 180 

NG LQVGATRGDGNIGENITEN+K++ DIP L + L ITVRGE Y+ ++SF+AIN ++ 
Sbjct: 121 NGFLQVGATRGDGNIGENITENIKKIKDIPYQLSEPLTITVRGEAYMSRQSFKAINEARQ 180 

Query: i81 ANGEQEFANPRNAAAGTLRQIjNTGIVAKRKIATFLYQEASPTQKETQDDVBKELESYGFS 240 

NGE EFANPRNAAAGTLRQL+T +VAKR+IATFLYQEASPT + Q++VL EL GFS 
Sbjct: 181 ENGETEFANPRNAAAGTLRQLDTSWAKRQIiATFLYQEASPTARNQQNEVIjAELADLGFS 240 

Query: 241 VNHHRLISSSMEKIWDFIQTIEKDRVSLPYDIDGIVIKVNSIAMQEELGFTVKAPRWAIA 300 

VN + ++SSM++IWDFI+TIE R L YDIDG+VIKVNS+AMQEELGFTVKAPRWAIA 
Sbjct: 241 WPYYQLTSSNIDEIWDFIKTIEAKRDQIAYDIDGWIKVNSLAMQEELGFTVKAPRWAIA 300 

Query: 301 YKFPAEEKEAEILSVDWTVGRTGVVTPTANLTPVQIAGTTVSRATLHNVDYIAEKDIRIG 360 

YKFPAEEIO^ILSVDWTVGRTGVVTPTANLTPVQIAGTTVSRATLHNVDYIAEKDIRIG 
Sbjct: 301 YKFPAEEKEAEILSVDWTVGRTGWTPTANLTPVQIAGTTVSRATLHNVDYIAEKDIRIG 360 

Query: 361 DTVVWKAGDIIPAVIJWVMSKRNQQEVMLIPKLCPSCGSELVHFEGEVALRCINPLCPN 420 

DTV+VYKAGDIIPAVLNWMSKRNQQEVMLIPKLCPSCGSELVHFE EVALRCINPLCP+ 
Sbjct: 361 DTVI VYKAGDI I PAVLNWMS KRNQQEVMLI PKLCPSCGSELVHFEDEVALRC INPLCPS 420 

Query: 421 QIKERIjAHFASRDAMNITGFGPSLVEKLFDAHLIADVADIYRLSIENLLTLDGIKEKSAT 480 

1+ L HFASRDAMNITG GP++VEKLF A + DVADIY+L+ E+ + LDGIKEKSA 
Sbjct: 421 LIQRSLEHFASRDAMNITGLGPAIVEKLFLAGFVHDVADIYQLTKEDFMQLDGIKEKSAD 480 

Query: 481 KIYHAIQSSKENSAEKLLFGLGIRHVGSKASRLLLEEFGNLRQLSQASQESIASIDGLGG 540 

K+ AI++SK NSAEKLLFGLGIRH+GSK SRL+LE +G++ L A +E IA IDGLG 
Sbjct: 481 KLIAAIEASKSNSAEKLLFGLGIRHIGSKVSRLILEVYGDISALLTAKEEEIARIDGLGS 540 

Query: 541 VIAKSLHTFFEKEEVDKLLEELTSYN^/NFNYLGKRVSTDAQLSGLTVVLTGKLEKMTRNE 600 

IA+SL +FE++ L++EL + VN +Y G++V++DA L GLTWLTGKL ++ RNE 
Sbjct: 541 TIAQSLTQYFEQKTAAILVDELKTAGVNMHYSGQKVNSDAALFGLTVVLTGKI^QI^NRNE 600 

Query: 601 AKEKLQNLGAKVTGSVSKKTDLIVAGSDAGSKLTKAQDLGITIQDEDWLLNL 652 

AK+KL+ LGAKVTGSVSKKTDL++AGSDAGSKL KA+ LGI I+DEDWL L 
Sbjct: 601 AKDKLEALGAKVTGSVSKKTDLVIAGSDAGSKLEKAKSLGIRIEDEDWLRQL 652 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. f 

Example 1426 

A DNA sequence (GBSxl511) was identified in S.agalactiae <SEQ ID 4377> which encodes the amino 
acid sequence <SEQ ID 4378>. Analysis of this protein sequence reveals the following: 

Possible site: 32 

>» Seems to have a cleavable N-term signal seq. 

INTEGRAL Likelihood = -5.63 Transmembrane 110 - 126 ( 108 - 128) 
INTEGRAL Likelihood = -2.13 Transmembrane 142 - 158 ( 141 - 159) 
INTEGRAL Likelihood = -1.12 Transmembrane 75 - 91 ( 75 - 93) 



WO 02/34771 



PCT/GB01/04789 



Final Results 

bacterial membrane — - Certainty=0. 3251 (Affirmative) < su.cc> 
bacterial outside Certainty=0. 0000 (Not Clear) < suco 
bacterial cytoplasm — Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAA68244 GB:X99978 citrulline cluster-linked gene [Lactobacillus 
plantarum] 

Identities = 56/158 (35%), Positives = 91/158 (57%), Gaps = 8/158 (5%) 



Query: 
Sbjct 

Sbjct 
Query 
Sbjc 



13 AIVTArYIVLTITPPPNAIAYGAYQFRVSEMMFLAFYHRKyLFAVTLGCMISNLYSFG- 71 

A+V A+Y+VL + p ++A QA QFRVSE LN LA ++RKY++ + G ++ + + G 
13 ALVAAMYVVLCLGPAAFSIASGAIQFRVSKGLlfflLAVFNRKyiWGIVAGVILFDAFGPGA 72 

72 -MIDVFVGGGSTLLFVYLGTILFKQYQKDYLFNGLINKAFFFFSFFFAASMITVAVELKI 130 

+++V GGG +LL + + T L + K L+N A F S F A MIT+ + 

73 • SLLNVLFGGGQSLLALLVIjTOLAPKL-KT^/WQRlffiMIALFTVSMFMIAIjMITM M 126 

131 VAGLPLLLTWLTTAVGELASLLVGAVLVDKISRHVDFT 168 

+G4 T+LTTA+ EL + + A ++ L R + F+ 
127 SSGVAFWPTYLTTALSELIIMSITAPIMYSLDRVLHFS 1S4 



A related DNA sequence was identified in S.pyogenes <SEQ ID 4379> wliich encodes the z 
25 sequence <SEQ ID 4380>. Analysis of this protein sequence reveals the following: 



Possible site-. 32 
Seems to have an uncleavable N-term signal seq 
:NTEGRAL Likelihood = -4 
Likelihood = -3 
Likelihood = -2 
Likelihood = -o. 
Likelihood = -0.59 



75 - 91 ! 70 - 94) 

Transmembrane 12 - 28 ( 8 - 28) 

Transmembrane 141 - 157 ( 140 - 158) 

Transmembrane 110 - 126 ( 110 - 126) 

Transmembrane 55 - 71 ( 54 - 73) 



• Final Results 

bacterial membrane --- Certainty=0. 2763 (Affirmative) < suco 

Certainty=0. 0000 (Not Clear) < suco 

--- Certainty=0. 0000 (Mot Clear) < suco 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 114/167 (68%), Positives = 137/167 (81%), Gaps = 1/167 (0%) 
Query: 
Sbjct 



Query: 
Sbjct 
Query 
Sbjct 



MNTFTTRDYAHMAI VTAI YI VLTITPPFNA.IAYGAYQFRVSEMLNFIAFYHRICYLFAVTL 6 0 
M T DY H+ +V A+Y+VLTITPP NAI+YG YQFR+SEM+NFIAFYHRKY+ AVTL 
MTKLTvHCTYVHIGLVAALriWLTITPPI^AISYGro^ 60 



61 GCMISNLYSFGMIDVFVGGGSTLLFVYLGTILFKQYQKDYLFNGLINKAFFFFSFFFAAS 120 

GCMI+N YSFG+ IDVFVGGGSTL+FV LG ILF +YQKDYLFNG+ NKAF +FSFFFA S 
61 GCMIANFYSFGLIDVFVGGGSTLIFVTLGVIIiFSKYQKDYLFWGIFNKAFVYFSFFFATS 120 

121 MITVAVELKIVAGLPLLLTWLTTAVGEIiASLLVGAVLVDfCLSRHVDF 167 

M VA+EL G P LLTW TTA+GEL SLL+G++++DKLS+ + F 
121 MFNVAIELYFF-GAPFLLTWFTTALGELVSLIiIGSLIIDKLSQRISF 166 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 



Example 1427 

A DNA sequence (GBSxl513) was identified in S.agalactiae <SEQ ID 4381> which encodes ti 
acid sequence <SEQ ID 4382>. Analysis of this protein sequence reveals the following: 
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Possible site: 53 

>>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood =-11.20 Transmembrane 255 - 271 ( 245 - 281) 

INTEGRAL Likelihood =-10.72 Transmembrane 141 - 157 ( 132 - 165) 

INTEGRAL Likelihood = -8.17 Transmembrane 189 - 205 ( 185 - 208) 

INTEGRAL Likelihood = -7.01 Transmembrane 36 - 52 ( 33 - 60) 



Final Results 

bacterial membrane Certainty=0 . 5479 (Affirmative) < succ 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 



+AE+ S++ VAYYLL++ FPLL+ N+ PY 1+ 4 



Query: 


17 


Sbjct: 


15 




77 


Sbj ct : 


75 




137 


Sbjct: 


135 


Query: 


197 


Sbj ct : 


195 


Query: 


257 


Sbjct: 





++ S G+L ++L AFW+ S+S+ +LQ A+NKA+GV+Q ++F++ 



+R ++PG +FST 



A related DNA sequence was identified in S.pyogenes <SEQ ID 4383> which encodes the e 
sequence <SEQ ID 4384>. Analysis of this protein sequence reveals the following: 

sT-terminal signal sequence 

INTEGRAL Likelihood =-12.58 Transmembrane 141 - 157 ( 132 - 168) 

INTEGRAL Likelihood =-12.15 Transmembrane 189 - 205 ( 177 - 210) 

INTEGRAL Likelihood =-11.68 Transmembrane 256 - 272 ( 245 - 280) 

INTEGRAL Likelihood = -7.54 Transmembrane 36 - 52 ( 33 - 60) 



Final Results 

bacterial membrane Certainty=0 . 6031 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:CAA68244 GB:X99978 citrulline cluster- linked gene [Lactobacillus 
plantarum] 

Identities = 53/170 (31%), Positives = 92/170 (53%), Gaps = 11/170 (6%) 

Query: 1 MTKLTVHDYWIGLVAALYVVLTITPPLNAISYGr^QFRISEMI^FLAFYHRKYIIAVTL 60 

MT+ + ++ LVAA+YWL + P +++ G QFR+SE +N LA ++RKYI + 
Sbjct: 1 MTQSKIRPWIINALVAAMYVVLCLGPAAFSIiASGAIQFRVSEGIJTOIiAVFNRKYIWGIVA 60 

Query: 61 GCMIANFYSFG--LIDVFVGGGSTLIFVTLGVILFSKYQKDYLFNGIFNECAFVYFSFFFA 118 

G ++ + + G L++V GGG +L+ + + L K + ++ + + + F 

Sbjct: 61 GVILFDAFGPGASLLNVLFGGGQSLLALLVLTWLAPKLKT VWQRMLLNIA- LFT 113 

Query: 119 TSMFNVA- - IELYFFGAPFLLTWFTTALGELVSLLIGSLI IDKLSQRISF 166 
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SMF +A I + G F T+ TTAL EIi+ + I + I+ L + + F 
Sbjct: 114 VSMFMIALMITMMSSGVAFWPTYLTTALSELIIMSITAPIMYSLDRVLHF 163 
!GB:AF071085 Orfde2 [Enterococcus faecalis] 175 2e-43 

>GP:AAC35915 GB:AF071085 Orfde2 [Enterococcus faecalis] 
Identities = 90/271 (33%) , Positives = 155/271 (56%) , Gaps = 3/271 (1%) 



W+ S+S+ +LQ A+NKA+G Q ++F 4 



Query: 


19 


Sbjct: 


17 




79 


Sbjct: 


77 


Query: 


139 


Sbjct: 


137 




199 


Sbj ct : 


197 




259 


Sbjct: 


254 



L+Y +4-PN K+ +R ILPG +F++ LS + G YV Y 



+MLW F A I+ILGAI NA 



An alignment of the GAS and GBS proteins is shown below. 
Identities = 188/302 (62%) , Positives = 244/302 (80%) 

Query: 1 MKLKKFFEDLLAKLEYRPIQVFMRHFQS74EMDLSAIAVAYYLIjVTAFPLLVIAANIFPYF 6 0 

M KK+F+ +L+K +Y PIQVFMRH QSAEMDLSAIAVAYYL++TAFPL+VIAANIFPY 
Sbjct: 1 MAEKKWFDKVLSKWQYEPIQVFKRHLQSAEMDLSAIAVAYYLILTAFPLIVIAANIFPYL 60 

Query: 61 HIWSDLLSLMQKWLPKWIYEPASRLAVDAFSKPSTGILGFASLTAFWTMSKSLTSLQKA 120 

+I+++DLL LM++NLPK+I+ PAS + + FSKPS +LG A+LT WTMS+SLTSLQKA 
Sbjct: 61 NIDIADLLRLMKQNLPKDIFRPASAIVENIFSKPSGSVLGVATLTGLWTMSRSLTSLQKA 120 

Query: 121 INKAYGVDQHRDFVISRLVGVGTGLIILFLLTFVLIFSTFSKPVLQIIVNMYDLGDTLTA 180 

INKAYG QHRDF I LVG+ T LIILFLL F LIFS FSK +Q++ Y L D +T 
Sbjct: 121 INKAYGASQHRDFFIGHLVGLLTSLIILFLLAFALIFSIFSKAAIQVLDKHYHLSDNITT 180 

Query: 181 WLLNIiAQPVTFLTIFLGIGILYFILPNARIRKVRYVIPGTLFSTFVIGFFSNLISQYVLN 240 

L L QP+T L IF+G+ +LYF+LPN +I+K+RY++PGTLF++FV+ F SNL+ YV+ 
Sbjct: 181 IFLLLIQPITVLIIFVGLMLLYFLLPNVKIKKIRYILPGTLFTSFVMTFLSNLVGNYWY 240 

Query: 241 RVEKMVDIKTFGSWIFILMLWFIFLAHIMILGAILNASVQEIATGKIESRRGDIMSLIQ 300 

VE+MVDIK FGSV+IFI+MLWFIFIA I+ILGAI N&+ QE++ GK+E R GD++++++ 
Sbjct: 241 NVERMVDIKMFGSVMIFIIMLWFIFLARILILGAIENATYQEMSLGKLEGRSGDMIAILK 300 

Query: 301 KS 3 02 
K+ 

' Sbjct: 301 KT 302 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1428 

A DNA sequence (GBSxl514) was identified in S.agalactiae <SEQ ID 4385> which encodes the amino 
acid sequence <SEQ ID 4386>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 
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Final Results 

bacterial cytoplasm Certainty=0 .4200 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1429 

A DNA sequence (GBSxl515) was identified in S.agalactiae <SEQ ID 4387> which encodes the amino 
acid sequence <SEQ ID 4388>. This protein is predicted to be methionine aminopeptidase (map). Analysis 
of this protein sequence reveals the following: 

3 N-terminal signal sequence 

Final Results , , 

bacterial cytoplasm Certainty=0 .2342 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9761> which encodes amino acid sequence <SEQ ID 9762> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC35914 GB:AF071085 methionine aminopeptidase A [Enterococcus 
faecalis] 

Identities = 101/207 (48%) , Positives = 128/207 (61%) , Gaps = 31/207 (14%) 

1 MITLKSAREIEAI*UDRAGDFLASIHIGLRDIIKPGvD^lWEVEEYVRRRCKEENVLPLQIGV 60 

MITLKS REIE MD +G+ LA +H LR IKPG+ W++E +VR + + QIG 
1 MITLKSPREIEMMDESGELLADVHRHLRTFIKPGITSTOIEVFVRDFIESHGGVAAQIGY 60 

61 DGAvMDYPYATCCGbNDEVAHAFPRHyTLKQGDLLKVDIWLSEPLDKSIVDVSSIiNFDNV 120 
+G Y YATCC +NDE+- H FPR LK GDL+KVDM + 
Sbjct: 61 EG YKYATCCSINDEICHGFPRKKVLKDGDLIKVDMCVD 98 

Query: 

Sbj< 

Query: 

Sbj< 

A related DNA sequence was identified in S.pyogenes <SEQ ID 4389> which encodes the amino acid 
sequence <SEQ ID 4390>. Analysis of this protein sequence reveals the following: 

Possible site: 42 

>>> Seems to have no N-terminal signal sequence 



50 Final Results 

bacterial cytoplasm Certainty=0. 2082 (Affirmative) < succ; 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

55 An alignment of the GAS and GBS proteins is shown below. 
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Identities = 256/286 (89%), Positives = 273/286 (94%) 



Query: 1 MITLKSAREIEAMDRAGDFLASIHIGLRDIIKPGVDI>MEVEEYVRRRCKEENVLPLQIGV 60 

MITLKSAREIEAMDRAGDFLA IHIGLRDIIKPGVDMWEVE YVRRRCKE+NVLPLQIGV 
Sbjct: 1 MITLKSAREIEAMDRftGDFLRGIHIGLRDIIKPGVDMWEVEAYVRRRCKEDNVLPLQIGV 60 

Query: 61 DGAVI^YPYATCCGLNDEVAHAFPRHyTLKQGDLLKOTMVLSEPLDKSIVDVSSIjNFDNV 120 

DG +MDYPYATCCGLNDEVAHAFPRHY LK+GDLLKVDMVLSEPLDKSIVDV++L+FDNV 
Sbjct: 61 DGHM^YPYATCCGLITOEVAflM'PRHYILKEGDLLKVDMVLSEPLDKSIVDVAALDFDNV 120 

Query: 121 AQMKKYTETYSGGLADSCWAYAVGEVSQEVKDLMSVTREAMYIGIEKAVIGNRIGDIGAA 180 

+MKK+T +Y+GGLADSCWAYAVG S E+K LM VT+EAMY GIEKAVIGNRIGDIGAA 
Sbjct: 121 PEMKKWTGSYTGGLADSCWAYAVGTPSDEIKQLMDVTKEAMYRGIEKAVIGNRIGDIGAA 180 

Query: 181 IQDYAESRGYGWRDLVGHGVGPTMHEEPMVPNYGTAGRGLRLREGMVLTIEPMINTGTW 240 

+Q+YAES GYGWRDLVGHGVGPTMHEEPMVPNYGTAGRGLRL+EGMVLT+EPMINTGTW 
•Sbjct: 181 VQEYAESFGYGWRDLVGHGVGPTMHEEPMVPNYGTAGRGLRLICEGMVLTVEPMINTGTW 240 

Query: 241 EIDTDMKTGWAHKTLDGGLSCQYEHQFVITKDGPVILTSQGEERTY 286 

EIDTD+KTGWAHKTLDGGLSCQYEHQFVITKDGPVILTSQGEERTY 
Sbjct: 241 E IDTDI KTGWAHKTLDGGLSCQYEHQFVI TKDGPVI LTSQGEERTY 286 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1430 

A DNA sequence (GBSxl516) was identified in S.agalactiae <SEQ ID 4391> which encodes the amino 
acid sequence <SEQ ID 4392>. Analysis of this protein sequence reveals the following: 

Possible site: 30 

>>> Seems to have no N- terminal signal sequence 

Pinal Results 

bacterial cytoplasm Certainty=0 .3473 (Affirmative) < suco 

bacterial membrane — Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9759> which encodes amino acid sequence <SEQ ID 9760> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAB06894 GB:AP001518 unknown conserved protein [Bacillus halodurans] 
Identities = 158/431 (36%) , Positives = 270/431 (61%) , Gaps = 6/431 (1%) 



Query: 


6 


Sbjct: 


3 




66 


Sbjct: 


63 


Query: 


126 


Sbjct: 


123 






Sbjct: 


183 




246 


Sbjct: 


243 



+K K NIEKLTYAE+ I D QV+ G +GL K ++F IGAM + +Y+ G LLIVG 



A+L+TGGF+ S +LAD+L +PV+ T YDTFTV+TMIN 



R S+A ++ M++E + ++PV+ 
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Query: 
Sbji 
Query: 
Sbj< 

Sbjct 



305 VENLSMSQ GTDLYTYSDQILSNLQIEDG-HFSFLVEPAMIDHTGSLTQGVLTEFL 359 

++LMQ G+ L+ +G + +PM + G+++ GV+T + 

303 LKALQMIQRQPHVGETIEDLMTNGLNESSSDQGDSYEVEITPQMTNQLGTISHGVMTSLV 362 

360 KEICIRVLTRKHQRSIWKQMTLYFLQPVQIDEIIMVTPTIISEKRREATLDLELKLENK 419 

E RVL + + +W+ +TLYFL+PVQID + + P ++ R+ +D+E+ E + 
363 IESGSRVLRKYKKGDLWENITLYFLKPVQIDSRLTIRPRVLEIGRKHGKIDVEMYHEGE 422 

420 IIAKAMIAVKI 43 0 

1 + KA+ +1 
423 IVGKALFMAQI 433 



A related DNA sequence was identified in S.pyogenes <SEQ ID 4393> which encodes the amino acid 
sequence <SEQ ID 4394>. Analysis of this protein sequence reveals the following: 

Possible site: 30 

>>> Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 3011 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 00 00 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 267/431 (61%) , Positives = 351/431 (80%) 

Query: 1 MI I VMS KHQE I LEYLENLAVGKRVSVRSI SNHLKVSDGTAYRAI KEAENRGI VETRPRSG 60 

+II+MSKHQ+IL+YLE LA+GK+VSVRSI SNHLKVSDGTAYRAI KEAENRG I VET+ PRSG 
Sbjct: 1 VIIIMSKHQDILDYLEKIAIGKKVSVRSISNHLKVSDGTAYRAIKEAENRGIVETKPRSG 60 

Query: 61 TTOVAQKAKVNIEKLTYAEIARISDSQWAGIEGLSKEFSKFSIGlAMTHRNIEICfLVQGG 120 

TVR+ +K +V I++LTY+EIARISDS+V+AG GL EFS+FSIGAMT 4NI +YLV+GG 
Sbjct: 61 TVRIEKKGRVRIDRLTYSEIARISDSEVLAGHAGLGHEFSRFSIGAMTQQNIRRYLVKGG 120 

Query: 121 LLIVGDRDEIQHLALQHQNAILOTGGFWSPSVCR^KLQIPVMvTfTfDTFTVSTMINH 180 

LLIVGDR+ IQ IAL++ NAILVTGGF VS V +A+ +IPVMVTHYDTFTV+TMINH 
Sbjct: 121 LLIVGDRETIQLLALENHNAILVTGGFPVSKRVIEMA^QRIPVIWTHYDTFTVATMINH 180 

Query: 181 TLSNAKIRTDLKTVEQWQSQMDYGFLAQDDTVKEFTOjLVKQTKNVRFPIVNQANVWGV 240 

LSN +I+TDLKTVEQV DYG+L +D +V+EFN L+K+T+ VRFP+++ V+GV 

Sbjct: 181 ALSNIRIKTDLKTVEQVMIPITDYGYLCEDSSVEEFNTLIKKTRQVRFPVLDYICRKVIGV 240 

Query: 241 VSVQDILGKDKEVKLATWSKNIIVAICPRMSLANISQKMIFEDLNMMPWSDDFELLGVI 300 

VS4+D++ + KL VMSKN I A+P SLANISQKMIFEDIoNM+PV ++ LLG+I 
Sbjct: 241 VSMRDVVDQLPTTKLTKVMSKNPITARPNTSLANISQKMIFEDLNMLPVTDEENNLLGMI 3 00 

Query: 301 TRRQAVENLSMSQGTDLYTYSDQILSNLQIEDGHFSFLVEPAMIDHTGSLTQGVLTEFLK 360 

TRRQA+ENL Q + YTYS+QILSNL+ ++ +VEP MID G+++ GV++EFLK 
Sbjct: 301 TRRQAMENLPNHQPNNPYTYSEQILSNLEETVDYYQVVVEPTMIDSAGNMSNGVISEFLK 360 

Query: 361 EICIRVLTRKHQRSIWKQMTLYFLQPVQIDEIIMVTPTIISEICRREATLDLELKLENKI 420 

EI IR LT+KHQ++I+++QM +YFL +QI++ + + P II+E RR +T+D+E+ +++++ 
Sbjct: 361 EISIRALTKKHQKNIIIEQMMVYFLHAIQIEDELKIYPKIITENRRSSTIDIEIFVDDQV 420 

Query: 421 IAKAMIAVKIN 431 

IAKA+I KIN 
Sbjct: 421 IAKAIITTKIN 431 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 
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Example 1431 

A DNA sequence (GBSxl517) was identified in S.agalactiae <SEQ ID 4395> which encodes the amino 
acid sequence <SEQ ID 4396>. Analysis of this protein sequence reveals the following: 

Possible site: 55 
5 >>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .2837 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

10 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database. 

>GP:B7AB04556 GB:AP001510 unknown conserved protein [Bacillus halodurans] 
Identities = 56/185 (30%), Positives = 86/185 (46%), Gaps = 4/185 (2%) 

Query: 7 MDIWTNLGRFAFIETEHWLRPVAYTDREAFWRIASKRTNLQFI - FPVQTSKKESDFLLV 65 

M+I G +ETE + LR D A + AS +++ + S K+S+ L 

Sbjct: 1 MEIEDIYGDLPTLETERLRLRKFYKDDAAAIYDYASNEQVTKYV1WETHQSIKDSEAFLA 60 

Query: 66 HSFMK EPLGVWAIEDKVSHKMFGVIRFENIDLSKKTAEIGYFLKESSWGQGIMTECL 122 

+ K + + WAIE K + +M G + F KTAE+GY L E WGQGIMTE + 

Sbjct: 61 FALNKYDEKDVSPWAIELKKNERMIGTvDFVVTOKPKDKTAELGYVIjSEPYWGQGIMTEAV 120 

Query: 123 KTLSFFAFREFGMDKLIIVTHKENIASQKVALKAHFKQSRSFKGSDRYTRRIRDYIEFQL 182 

L F F ++++ ENI+S +V KA + + + RD+ + + 

Sbjct:, 121 I^VEFGFNimELERIQAKCFAENISSARVMEKAGLIYEGTHRRAIWKGAHRDFKVYAI 180 

Query: 183 TRGDY 187 
R DY 

Sbjct: 181 IREDY 185 

A related DNA sequence was identified in S.pyogenes <SEQ ID 667> which encodes the amino acid 
sequence <SEQ ID 668>. Analysis of this protein sequence reveals the following: 

o N-terminal signal sequence 



Final Results 

bacterial cytoplasm --- Certainty=0 . 1096 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

40 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 94/177 (53%) , Positives = 117/177 (65%) 

Query: 7 MDIWINLGRFAFIETEm/NLRPVAYTDREAFWRIASKR^NLQFIFPVQTSKKESDFLLvH 66 

MDIWT L FAF ET V LRP YD F+ + + NL ++FP Q +K SD+LLVH 
Sbjct: 1 MDIWTK^VFAFFETPKVILRPFRYEDHWDFYSMVNDTKNLYYVFPEQKTKAASDYLLVH 60 



Query: 67 

SF+K PLG WAIEDK +H++ G IR E+ D + A+IGYFL 
Sbjct: 61 SFIKFPLGQWAIEDKATHQVIGSIRIEHYDAKTRCBDIGYFnNYAFWGQGIMTEWIKLV 120 

Query: 127 FFAFREFGMDKLIIVTHKENIASQKVALKAHFKQSRSFKGSDRYTRRIRDYIEFQLT 183 

+ +F EFG+ L I+TH EN ASQKVA KA F+ FKGSDR T +1 Y +QLT 
Sbjct: 121 YLSFHEFGLKTLRI ITHLENKASQKV7AKKAGFQLKTCFKGSDRNTHKI CI YKMYQLT 177 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 
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Example 1432 

A DNA sequence (GBSxl518) was identified in S.agalactiae <SEQ ID 4397> which encodes the amino 
acid sequence <SEQ ID 4398>. This protein is predicted to be UDP-N-acetylglucosamine-l-carboxyvinyl 
transferase (murA). Analysis of this protein sequence reveals the following: 

5 Possible site: 61 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -5.63 Transmembrane 25 - 41 ( 24 - 42) 

Final Results 

10 bacterial membrane Certainty=0 .3251 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 .0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

15 >GP:AAF86297 GB:AF072894 UDP-N-acetylglucosamine-l-carboxyvinyl 

transferase [Listeria monocytogenes] 
Identities = 240/412 (58%) , Positives = 303/412 (73%) , Gaps = 2/412 (0%) 

Query: 3 KIIINGGKQLTGEVAVSGAICNSWALIPATILADDWVLDGVPAISDVDSLVDIMETMGA 62 
20 K+II GGK+L G + V GAKNS VALIPA ILA+ WL+G+P ISDV +L +I+E +G 

Sbjct: 20 I<LIIRGGKKIAGTLQvDGAKNSAvALIPAAIIAESEvVLEGLPDISDVHTLYNILEELGG 79 



Query: 63 KIKRYGETLEIDPCGWDIPMPYGKINSLRASYYFYGSLLGRYGCATLGLPGGCDLGPRP 122 
++ +T IDP + +P+P G + IiRASYY G++LGR+ +A +GLPGGC LGPRP 
25 Sbjct: 80 TVRYDNKTAVIDPTDMISMPLPSGNVKKLRASYYLMGAMLGRFKKAVIGLPGGCYLGPRP 139 



Sbjct: 

Query: 183 VIENAAREPEIIDVATLLNNMGAHIRGAGTDVITIEGVKSLHGTRHQVIPDRIE&GTYIA 242 ' 

VIENAA+EPEIIDVATLL NMGA I+GAGTD I I GV+ LHG H +IPDRIEAGT++ 
Sbjct: 198 VIENAAKEPEIIDVATLLTNMGAIIKGAGTDTIRITGVEHLHGCHHTIIPDRIEAGTFMV 257 

Query: 243 DKUU^IGRGIKVTNVLYEHLESFIAKLDEMGVRMTVEEDSIFVEEQERLKAVSIKTSPYPG 302 

+AAA G+G+++ NV+ HLE IAKL EMGV M +EED+IFV E E++K V IKT YPG 
Sbjct: 258 IJWISGKGVRIENVIPTHLEGIIAKLTEMGVPMDIEEDAIFVGEVEKIKKVDIKTYAYPG 317 



Query: 303 FATDLQQPLTPLLLTAEGNGSriLDTIYEKRVNHVPELARMGANISTLGGKIVYSGPNQLS 362 

F TDLQQPLT LL AEG+ + DTIY R H+ E+ RMG G V +GP QL 

Sbjct: 318 FPTDLQQPLTALLTRAEGSSVITDTIYPSRFKHIAEIERMGGKFKLEGRSAVINGPVQLQ 377 

G+ V ATDLRAGAALVIA L+A+G TEI +E I RGYS IIEKL+++GA+I 
Sbjct: 378 GS KVTATDLRAGAALVI AALLADGETE I HGVEH I ERGYS KI I EKLSAIGANI 429 



A related DNA sequence was identified in S.pyogenes <SEQ ID 4399> which encodes the amino acid 
sequence <SEQ ID 4400>. Analysis of this protein sequence reveals the following: 

Possible site: 21 
>>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -8.70 Transmembrane 25 - 41 ( 23 - 45) 

Final Results 

bacterial membrane Certainty=0 .4482 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 .0000 (Not Clear) < suco 



The protein has homology with the following sequences in the databases: 

>GP:AAF86297 GB:AF072894 UDP-N-acetylglucosamine-l-carboxyvinyl 
60 transferase [Listeria monocytogenes] 

Identities = 244/412 (59%) , Positives = 302/412 (73%) , Gaps = 2/412 (0%) 
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Query: 3 KIIINGGKaLSGEVAVSGAKNSWALIPAIILADDIVILDGVPAISDVDSLIEIMELMGA 62 

K+II GGK Iri-G + V GAKNS VALIPA ILA+ V+L+G+P ISDV +L I+E +G 
Sbjct: 20 KLIIRGGKKLAGTLQVDGAKNSAVALIPAAI^AESEVVLEGLPDISDVHTLYNIIjEELGG 79 

Query: 63 TVNYHGDTLEIDPRGVQDIPMPYGKINSLRASYYFYGSLLGRFGQAWGLPGGCDLGPRP 122 

TV Y T IDP + +P+P G + LRASYY G4+LGRF +AV+GLPGGC LGPRP 
Sbjct: 80 TWYDNKTAVIDPTDMISMPLPSGIWKKLRASYYIJraMLGRFKKAVIGLPGGCYLGPRP 139 

Query: 123 IDLHLKAFEMGVEVSYEGEIWLSTNGQKIHGAHIYMDTVSVGATINTMVAATKAQGKT 182 

ID H+K FEA+G +V+ E + L + ++ GA IY+D VSVGATIN M4AA +A+GKT 
Sbjct: 140 IDQHIKGFEALGAKVTPTEQGAIYLRAD- -ELKGARIYLDWSVGATINIMIAAVRAKGKT 197 

Query: 183 VIENAAREPElIDVATLIiNNMGAHIRGAGTDIITIQGVQKLHGTRHQVIPDRIEAGTYIA 242 

VIENAA+EPEIIDVATLL NMGA I+GAGTD I I GV+ LHG H +IPDRIEAGT++ 
Sbjct: 198 VIENAAKEPE 1 1 DVATLLTNMGAI I KGAGTDT I RI TGVEHLHGCHHT 1 1 PDRIEAGTFMV 257 

Query: 243 IAAAIGKGWITNVLYEHLESFIAKLEEMGVRMTVEEDAIFVEKQESLKAITIKTSPYPG 302 
LAAA GKGV+I NV+ HLE IAKL EMGV M +EEDAIFV + E +K + IKT YPG 

Query: 303 FATDLQQPLTPLLLKADGRGTIIDTIYEKRINHVPELMRMGADISVIGGQIVYQGPSRLT 362 

F TDLQQPLT LL +A+G I DTIY R H+ E+ RMG + G V GP 4L 
Sbjct: 318 FPTDLQQPLTALLTRAEGSSVITDTIYPSRFICHIAEIERMGGKFKLEGRSAVINGPVQLQ 377 

Query: 363 GAQVKATDLRAGAALVTAGLIAEGKTEITNISFILRGYASIIAKLTALGADI 414 

G++V ATDLRAGAALV A L+A+G+TEI +E I RGY+ II KL+A4GA+I 
Sbjct: 378 GS KVTATDLRAGAALVIAALLADGETE I HGVEHIERGYSKI I EKLSAIGANI 429 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 344/419 (82%) , Positives = 394/419 (93%) 

Query: 1 MRKIIINGGKQLTGEVAVSGAKNSWALIPATILADDVVVLDGVPAISDVDSLVDIMETM 60 

MRKI I INGGK L-I-GEVAVSGAKNSWALIPA ILADD+V+LDGVPAISDVDSL++IME M 
Sbjct: 1 MRKI I INGGKALSGEVAVSGAKNSWALI PAI I LADD I VILDGVPAISDVDSLIEIMELM 60 

Query: 61 GAKIKRYGETLEIDPCGVKDIPMPYGKINSLRASYYFYGSLLGRYGQATLGLPGGCDLGP 120 

GA + +G+TLEIDP GV+DIPMPYGKINSLRASYYFYGSLLGR+GQA +GLPGGCDLGP 
Sbjct: 61 GATWYHGDTLEIDPRGVQDIPMPYGKINSLRASYYFYGSIjLGRFGCAWGLPGGCDLGP 120 

Query: 121 RPIDLHLKAFFAMGASVSYEGDSMRIA'TIIGKPIjQGANIYMDTVSVGATINTIIAAAKANG 180 

RPIDLHLKAFEAMG VSYEG+4M L4TNG4 4 GA4IYMDTVSVGATINT44AA KA G 
Sbjct: 121 RPIDLHLKRFEAMGVEVSYEGEIWINLSTNGQKIHGAHIYMDOTSVGATIOTWAATKAQG 180 

Query: 181 RTVIENAAREPEI IDVATLLNNMGAHIRGAGTDVITIEGVKSLHGTRHQVI PDRIEAGTY 240 

4TVIENAAREPEIIDVATLLNNMGAHIRGAGTD4ITI4GV4 LHGTRHQVIPDRIEAGTY 
Sbjct: 181 KTVIENAAREPEIIDVATLLNNMGAHIRGAGTOIITIQGVQKLHGTRHQVI PDRIEAGTY 240 

Query: 241 IAMAAAIGRGIKYTIWLYEHLESFIAKLDEMGVRMTVEEDSIFVEEQERLKAVSIKTSPY 300 

IA4AAAIG4G4K4TNVLYEHLESFIAKL4EMGVRMTVEED4 1 FVE4QE LKA44IKTSPY 
Sbjct: 241 IALAAAIGKGWITNVLYEHLESFIAKiEEMGVRMTVEEDAIFVEKQESLKAITIKTSPY 300 

Query: 301 PGFATDLQQPLTPLLLTAEGNGSLLDTIYEKRVNHVPELARMGANISTLGGKIVYSGPNQ 360 

PGFATDLQQPLTPLLL A4G G444DTIYEKR4NHVPEL RMGA4IS 4GG4IVY GP44 
Sbjct: 301 PGFATDLQQPLTPLLLKADGRGTIIDTIYEKRINHVPELMRMGADISVIGGQIVYQGPSR 360 

Query: 361 LSGAPVKATDLRAGAALVIAGLMAEGRTEITNIEFILRGYSNIIEKLTSLGADIQLVEE 419 

L4GA VKATDLRAGAACjV AGL4AEG+TEITNIEFILRGY44II KLT4LGADIQL4E4 
Sbjct: 361 I 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 
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Example 1433 

A DNA sequence (GBSxl519) was identified in S.agalactiae <SEQ ID 440 1> which encodes the amino 
acid sequence <SEQ ID 4402>. This protein is predicted to he thiamine phosphate pyrophosphorylase 
(thiE). Analysis of this protein sequence reveals the following: 



ninal signal sequence 



Pinal Results 

bacterial cytoplasm Certainty=0 . 0422 (Affirmative) < succ 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 



Query: 5 LKLYFVCGTVDCSR - KNI LTWEKALQAG I TLFQFREKGFTALQGKEKI AMAKQLQI LCK S3 

L +YF+CGT D +1 V++EAL+ GITL+QFREKG A G++K+A+AK+LQ LCK 
Sbjct: 7 LNWFICGTQDIPEGRTIQEVLKFALEGGITLYQFREKGNGAXTGQDKVALAKELQALCK 66 

Query: 64 QYQVPFIIDDDIDLVELIDADGLHIGQNDLPVDEARRRLPDKIIGLSVSTMDEYQKSQLS 123 

Y VPFI++DD+ L E 1DADG+H+GQ+D VD+ R KIIGLS+ ++E S L+ 
Sbjct: 67 SYWPFIVNDDVALAEEIDADGIHVGQDDEAVDDFNNRFEGKIIGLSIGNLEELNASDLT 126 

Query: 

Sbjct: 

Query: 184 GIAVISAISKANHIVDATRQ 203 

G++VISAI+++ H+ + + 
Sbjct: 187 GVSVISAIARSPHVTETVHK 206 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could he useful antigens for 
vaccines or diagnostics. 

Example 1434 

A DNA sequence (GBSxl520) was identified in S.agalactiae <SEQ ID 4403> which encodes the amino 
acid sequence <SEQ ID 4404>. This protein is predicted to be hydroxyethylthiazole kinase (b2104). 
Analysis of this protein sequence reveals the following: 

Possible site : 54 

>>;. Seems to have a cleavable N-term signal seq. 

INTEGRAL Likelihood = -4.94 Transmembrane 198 - 214 ( 194 - 217) 

Final Results 

bacterial membrane Certainty=0 .2975 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 8805> which encodes amino acid sequence <SEQ ID 8806> 
was also identified. Analysis of this protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 7 
McG: Discrim Score: -2.93 
GvH: Signal Score (-7.5): 1.61 

Possible site: 39 
>>> Seems to have no N-terminal signal sequence 
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ALOM program count: 1 value: -4. 
INTEGRAL Likelihood = -4.94 
PERIPHERAL Likelihood = 2.49 151 
modified ALOM score: 1.49 

*** Reasoning Step: 3 

Final Results 

bacterial membrane Certainty=0 .2975 (Affirmative) < suco 

bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAF25543 GB:AF109218 ThiM [Staphylococcus carnosus] 
Identities = 114/253 (45%), Positives = 160/253 (63%), Gaps = 1/253 (0%) 

Query: 18 LEQLKEWPLTICITNNWKNFTANGLLALGASPAMSECIEDLEDLLKVADALLINIGTL 77 

L+Q++ +PL IC TN+WKNFTANGLL+LGASP MSE ++ ED VA ++LINIGTL 
Sbjct: 5 LDQIRTEHPLVICYTNDWKNFTANGLLSLGASPTMSEAPQEAEDFYPVAGSVLINIGTL 64 

Query: 78 TKESWQLYQEAIKIANKNQVPWLDPVAAGASRFRLEVSLDLLKNYSISLLTGNGSEIAA 137 

TK E KIAN+ + P+V DPVA GAS++R + LK +++ GN SEI A 

Sbjct: 65 TKHHEHAMLENAICIANETETPLVFDPVAVGASKYRKDFCKYFLKKI KPTVI KGNASE I LA 124 

Query: 138 LIGEKQASKGADGGIOTADLESIAVKANQVFDVPVVVTGETDAIAVRGEvRLLQNGSPLMP 197 

LI + KG D D+ IA KA + + +++TGETD I +V L NGS + 

Sbjct: 125 LIDDTATMKGTDSADNLDVvniAEKAYKEYQTAIILTGETDVIVQDNKVVKLSNGSHFLA 184 

Query: 198 LOTGTGCLLGAVIAAFIGSSDRSDDLACLTEAMTVYNVAGEIAEKVAKGKGVGSFQVAFL 257 

+TG GCLLGAV+ AF+ + + L EA++VYN+A E AE+++ KG G+F F+ 
Sbjct: 185 KITGAGCLLGAWGAFL-FRNTHPSIETLIEAVSVYNIAAERAEQLSDSKGPGTFLTQFI 243 

Query: 258 DALSQMKSEMIMD 270 

DAL ++ S+ + + 
Sbjct: 244 DALYRIDSDAVAE 256 

No corresponding DNA sequence was identified in S.pyogenes. 

SEQ ID 8806 (GBS398) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 75 (lane 6; MW 31.8kDa). 

The GBS398-His fusion product was purified (Figure 214, lane 5) and used to immunise mice. The 
resulting antiserum was used for FACS (Figure 314), which confirmed that the protein is immunoaccessible 
on GBS bacteria. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1435 

A DNA sequence (GBSxl521) was identified in S.agalactiae <SEQ ID 4405> which encodes the amino 
acid sequence <SEQ ID 4406>. This protein is predicted to be ThiD (thiD). Analysis of this protein 
sequence reveals the following: 

signal seq 

■ Final Results 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 
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The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAF25542 GB:AF109218 ThiD [Staphylococcus carnosus] 
Identities = 139/258 (53%) , Positives = 186/258 (71%) , Gaps = 4/258 (1%) 



Query: 


8 


LTIAGTDPSGGAGIMADLKTFQftRRTYGMAVVTSWAQNTCGVRGVQHIETAIIDQQLaC 


67 






LTIAGTDP+GGAG+MADLK+F A YGMA +TS+VAQNT GV+ + +++ + +QL 




Sbjct: 


8 


LTIAGTDPTGGAGVMADLKSFHACGVYGM^ITSIVAQNTKGVQHIHNLDITWLKEQLDS 


67 


Query: 


68 


VYDDIKPKAVKTGMIAERETISLVASYLKKYPQ-PYVLDPVMVATSGHRLIDSDAVE7ALK 


126 






++DD P+A+KTGM+A +E + L+ SYL+KYP PYV+DPVM+A SG L+D AL+ 




Sbjct: 


68 


IFDDELPQAIKTGMIATKEMMELIRSYLEKYPDIPYVIDPVMIAKSGDSLMDDAGKHALQ 


127 




127 


EDLLPLAT 1 1 TPNLPEAE VLVGYDLSDEVS 1 1 KAGYD IQKQYSVRNVLI KGGHLD - - GLA 


184 






E LLPLA + TPNLPEAE +VG+ L E +1 KAG + + V+IKGGH+4- +A 




Sbj ct: 


128 


EILLPLADVATPNLPEAEEIVGFKLDTEEAIKKAGDIFINEIGSKGWIKGGHIEDKNIA 


187 




185 


KDYLFLEKEGLITLSNQRINTIHTHGTGCTFAftWAAELAKGQSIIiNAVSTAKSFITSAI 


244 






KDYLF K+GL ++R +T HTHGTGCTF+AV+ AELAKG++I AV AK FI +1 




Sbjct: 


188 


ETAPELGLGNGPVNHTSY 262 


246 






+ PE+G G GPVNH +Y 




Sbjct: 


247 


KYTPEIGQGRGPVNHFAY 264 





25 There is also homology to SEQ ID 4408. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1436 

A DNA sequence (GDSxl522) was identified in S.agalactiae <SEQ ID 4409> which encodes the amino 
30 acid sequence <SEQ ID 4410>. This protein is predicted to be TenA (tenA). Analysis of this protein 
sequence reveals the following: 
Possible site: 42 

»> Seems to have no N-terminal signal sequence 

35 Final Results 

bacterial cytoplasm Certainty=0. 2242 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

40 The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAF25541 GB:AF109218 TenA [Staphylococcus carnosus] 
Identities = 78/213 (36%) , Positives = 127/213 (59%) , Gaps = 6/213 (2%) 



Query: 


14 


IQSIYQDPFIQGIIKGRLDHDVICHYLQADNIYLGKFADIYALCIAKSDNLRDKQFFLEQ 








I IYQD FIQ ++KG + + + YL+AD YL +FA+IYAL + +L +F ++Q 




Sbjct: 


15 


IDEIYQDHFIQELrjKGDIKKEALRQYLRADASYLREFANIYALLIPIMPDIjESVRFLVDQ 74 




74 


IDFTLNREIADGEGPHQAIAAYTNRSYQDIIEKGVWYPSADHYIKHMYFHFY-ENGIAGA 


132 






I F +N E+ H+ +A Y +Y +I++K W PS DHYIKHMY++ Y A A 




Sbj ct : 


75 


IQFIVNGEVE AHEYMADYIGENYNEIVQKKVWPPSGDHYIKHMYYNVYAHENAAYA 


130 




133 


IAAMSPCPWIYHQI^KKIIEENQFLNGNPFNNWITFYANDTVEELMENYFRMMDYYAQNL 


192 






+AAM+PCP++Y +AK+ +++ + W FY N ++ L+E +M+ N+ 




Sbjct: 


131 


IAAmPCPYWAMIAKRAMKDPNMKSSILAKPJFEFY-NTEMDPLIEVLDDLMNQLTANM 


189 


Query: 


193 


SKEKQADLVDAFVKSCQHERRFFQMAINQEKWE 225 








S+ ++ ++ + +++S HE FF MR EKW+ 




Sbjct: 


190 


SETEKNEVRENYLQSTVHELNFFNMAYTSEKWQ 222 





WO 02/34771 



PCT/GB01/04789 



-1585- 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 



Example 1437 

A DNA sequence (GBSxl523) was identified in S.agalactiae <SEQ ID 441 1> which encodes the amino 
acid sequence <SEQ ID 4412>. Analysis of this protein sequence reveals the following: 



Possible site: 35 
>» Seems to have a cleavable N- 
Likelihood = -7 
INTEGRAL Likelihood = -2 
Likelihood = -1 
Likelihood = -1 
Likelihood = -0 



:ntegral 



rm signal seq. 

Transmembrane 43 - 

Transmembrane 92 - 

Transmembrane 135 - 

Transmembrane 69 - 

Transmembrane 21S - 



- Final Results 

bacterial membrane - 
bacterial outside - 
bacterial cytoplasm - 



- Certainty=0. 3824 (Affirmative) < succ; 
■ Certainty=0 . 0000 (Not Clear) < suco 

- Certainty=0 . 0000 (Not Clear) < suco 



20 The protein has homology with the following sequences in the GENPEPT database. 





21 


Sbj Ct : 






81 


Sbjct: 


64 




141 


Sbjct: 


123 




201 


Sbj ct : 


183 



SL Q +PS FAYG+ 



40 No corresponding DNA sequence was identified in S.pyogenes. 

A related GBS gene <SEQ ID 8807> and protein <SEQ ID 8808> were also identified. Analysis of this 
protein sequence reveals the following: 



0.2 



Lipop: Possible site: -1 
McG: Discrim Score: 
GvH: Signal Score (-7.5) 

Possible site: 35 
»> Seems to have a cleavable 
ALOM program count: 5 value 
INTEGRAL Likelihood = -' 
INTEGRAL Likelihood = -.' 
INTEGRAL Likelihood = -: 
INTEGRAL Likelihood = -: 
INTEGRAL Likelihood = -I 
PERIPHERAL Likelihood = I 
modified ALOM score: 1.91 

*** Reasoning Step: 3 



erm signal seq. 

7.06 threshold: C 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 



17C 
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Final Results 

bacterial membrane Certainty=0 . 3824 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1438 

A DNA sequence (GBSxl524) was identified in S.agalactiae <SEQ ID 4413> which encodes the amino 
acid sequence <SEQ ID 441 4>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 

• Final Results 

bacterial cytoplasm Certainty=0 . 3007 (Affirmative) < suco 

bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAA91229 GB:Z56283 orfl [Lactobacillus helveticus] 
Identities = 123/424 (29%), Positives = 200/424 (47%), Gaps = 48/424 (11%) 

Query: 17 LFDEVTFSLNPGERILISGYSGCGKSTLALLLSGL- -KESGK- -GQVLLNGSLIEPSDVG 72 

L +++ ++ PG +LI G +GCGKSTL +++GL K +GK G++ L+G 
Sbjct: 12 LINQLNMNIAPGFNLLI-GPTGCGKSTLLKI1AGLYPKYAGKLTGKIDLHGQ KAA 65 

Query: 73 FLFQNPDLQFCMDTVAHELYFILENLQIEPEQMQDRSEFVLAQVGLKGFQNRLIYTLSQG 132 

+FQN QF M T E+ F LENLQI+ + + + + ++ I TLS G 

Sbjct: 66 MMFQNAAEQFTMTTPREE 1 1 FALENLQI KAKDYDLHI KKAVEFTKIADLLDQKINTLSGG 125 

Query: 133 EKQRLAIATIFLKSPKLIILDEAFANLDQESASQLLQLVIxMYQANNQSMLIVIDHLITYY 192 

++Q +ALA + + +LDE FA+ D + L++ + + ++ +1+ DH++ Y 

Sbjct: 126 QQQHVAIAVLIAMDVDVFLLDEPFASCDPNTRHFLIEICLASrAETGRT-IILSDHVLDDY 184 

Query: 193 QDIMDHYFWLEKRLTRVNFDYMLNRLNVFELEKKSHN TGDKLLSIKDFQVK- 243 

+ I DH + E + + N+L F+ K+ H TG + + Q+K 

Sbjct: 185 EKICDHLYQFEGKTVKEIjSANEKNKL--FKQNKQFHEQSYSFALPTGTPVFELNKTQIKQ 242 

Query: 244 LSKNKFISYLDFDLASGERLCLDGP3GVGKSSLFMGLLGLYRTKGK KQ 291 

L +NK Y G+ + G +GVGK+SLF + + KG + 

Sbjct: 243 NRLLLKQNKLKIY GKTTLITGSNGVGKTSLFKAMTKMIPYKGNFTYLDNEISK 295 

Query: 292 FTHRKQI P - 1 S FLFQNPLDQF IFSTVYDE I FQVCKDSN KARDILETINLWDKKQ 344 

+RK + 1+ FQ DQF+ TV DEI KD N K + LE + L 

Sbjct: 296 IKYRKYLSQIAQFFQKASDQFLTVTVKDEIELSKKDRNIvIFFTDAKIDEWLEKLQLKQHLD 355 

Query: 345 FSPFQLSQGQQRRIAIGSILASDSKLLLLDEPTYGQDAYHANMITTLLLSYCHKNHCGVI 404 

+ LS GQQ++L I +L + +LL+DEP G D +++ L+ K + 

Sbjct: 356 QWYSLSGGQQKKLQILLMLMTKH1IVLLIDEPLSGLDHESVDLVLQLMQECQEKLQQTFL 415 

Query: 405 FTSH 408 



Query: 28 GERILISGYSGCGKSTLALLLSGLKESGKGQVLLNGSLIEP SDVGFLFQNPDLQ 81 

G+ LI+G +G GK++L ++ + L+ + + S + FQ Q 

Sbjct: 256 GKTTLITGSNGVGKTSLFKAMTKMIPYKGNFTYLDNEISKIKYRKYLSQIAQFFQKASDQ 315 



Query: 82 FCMDTVAHELYFILENLQIEPEQMQDRS3FV- 
F TV E+ +DR+ F 



- LAQVGLKGFQNRLIYTLSQGE 133 
L ++ LK ++++Y+LS G+ 
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Sbjct: 316 FLTVTVKDEIEL SKXDRNNFFTDAKIDEWLEKLQLKQHLDQWYSLSGGQ 365 

Query: 134 KQRLALATIFLKSPKLlILDFAFAmDQESASQLLQU/LNYQfiNNQSMLIVIDHLITYYQ 193 

+++L + + + ++++DE + LD ES +LQL+ Q Q ++I H I 
Sbjct: 366 QKKLQILLMLMTKHNVLL1DEPLSGLDHESVDLVLQLMQECQEKLQQTFLIISHQIDALA 425 

Query: 194 DIMDH 198 

D D+ 
Sbjct: 426 DFCDY 430 

A related DNA sequence was identified in S.pyogenes <SEQ ID 4415> which encodes the amino acid 
sequence <SEQ ID 441 6>. Analysis of this protein sequence reveals the following: 

J- terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0 .3093 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 120/455 (26%), Positives = 203/455 (44%), Gaps = 47/455 (10%) 

Query: 1 MLSWKLACTHGDSHYLFDEV-TFSLNPGERILISGYSGCGKSTLALLLSGLKE---SGK 56 

M+S E+L T+ D ++ T + G+ I++ G SG GKST LL+G+ +GK 

Sbjct: 21 MISAEQLVFTYHDQKWPACQISTCQIASGQFIVLCGPSGSGKSTFLKLLNGriPDYYAGK 80 

Query: 57 GQVLLNGSLIEPS DVGFLFQNPDLQFCMDTVAHEJ .YFILENLQIEPEQMQD 107 

+ L+ + + V +FQNP QF V HEL F EN ++ + + 

Sbjct: 81 YEGRLDVADCQAGRDSVETFSRSVASVFQNPASQFFYREVQHELVFPCENQGLDAKVIMK 140 

Query: 108 RSEFVLAQVGLKGFQNRIiIYTLSQSEKQRIiAIATIFLKSPKLIIIjDEAFANLDCjESASQL 167 

R + N+ ++ LS G+KQR+A+AT 4+ +++ DE ANLD + + 

Sbjct: 141 RLWTLAErjFAFAELLNKDMFGLSGGQKQRVAIATAIMQGTNIMLFDEPTANLDSAGIAAV 200 

Query: 168 LQLVLNYQANNQSML1VIDHLITYYQDIMDHYFW LEKRLTRVNF DY 213 

+ +A ++ +IV +H + Y D+ D++F+ L +LT N D 

Sbjct: 201 KAYLTQLKAAGKT- 1 1 VAEHRLITfLTOIADNFFYFKNGRLTDKLTTQNLLALTDEQRQDM 259 

Query: 214 MLNRLNVFELE KKSHNTGDKLLS I KDFQVKLSKNKFISYLDFDLASGERLCLD 266 

L RL++ +L+ + H D L 1+ V+ AG + 

Sbjct: 260 GLRRLDLSDLKPVLAGKIESQHYRPDDSLCIEHLTVRAGSKILRCIEQLSFAVGSISGIT 319 

Query: 267 GPSGVGKSSLFMGLLGLYRTKGKKQFTHRKQIPISFLFQNPLDQFIFSTVYDEIF--QVC 324 

G +G+GKS L + G+ KK + IP+S + + V ++F V 

Sbjct: 320 GSNGLGKSQLVYYIAGI--LDDKKATIKFQGIPLSAKQRLSKTSIVLQEVSLQLFAESVS 377 

Query: 325 KDSN KARDILETINLVIDICKQFSPFQLSQGQQRRLAIGSILASDSKLLLLDEPT 377 

K+ N + ++4-E ,++L + P LS G+Q+R+ I + L +D +L-)- DEP+ 

Sbjct: 378 KEVNLGHERHPRTTEVIERLSLTTLLERHPASLSGGEQQRVMIAASLLADKDILIFDEPS 437 

Query: 378 YGQDAYHANMITTLLLSYCHKNHCGVIFT3HDPHL 412 

G D + LL+ H VI SHD L 

Sbjct: 438 SGLDLLQMKALANLLMQ-LKTQHKVVILISHDEEL 471 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1439 

A DNA sequence (GBSxl525) was identified in S.agalactiae <SEQ ID 4417> which encodes the amino 
acid sequence <SEQ ID 441 8>. Analysis of this protein sequence reveals the following: 



WO 02/34771 



PCT/GB01/04789 



Possible site: 42 

»> Seems to have an uncleavable N- 
INTEGRAL Likelihood = 
INTEGRAL Likelihood = 
INTEGRAL Likelihood = 
INTEGRAL Likelihood = 
Likelihood = 



erra signal seq 
Transmembrane 8-24 
Transmembrane 145 - 161 
Transmembrane 66 - 82 
112 - 128 



Transmembrane 



43 ■ 



43 - 



Final Results 

bacterial membrane Certainty=0 . 5649 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB13180 GB:Z99110 ykoE [Bacillus subtilis] 
Identities = 68/177 (38%), Positives = 117/177 (65%), Gaps = 1/177 (0%) 

Query: 5 LKDVLLIALLAWLGWYFGAGYISNAFVPFVGPIAHEVIYGIWFVAGPMALYILRKPGT 64 

+K++++++++++V WY + N GPIA+E IYGIWF+ +A Y++RKPG 

Sbjct: 6 VKEIVIMSVISIVFAWYLLFTHFGNVLAGMFGPIAYEPIYGIWFIVSVIAAYMIRKPGA 65 

Query: 65 AIVAELLAALIEVLIGSIYGPSVLVIGTLQGLGSELGFTLFRYHNYKLPAFILSAILTSI 124 

A+V+E++AAL+E L+G+ GP V+VIG +QGLG+E F R+ Y LP +L+ + +S+ 
Sbjct: 66 ALVSEIIAALVECLLGNPSGPMVIVIGIVQGLGAEAVFLATRWKAYSLPVLMLAGMGSSV 125 

Query: 125 FSFAWSFYANGLSAFSFSYNILMLIVRTVS-SIIFFLLTKNICDQLHRSGVLNAYGI 180 

SF + + 4G +A+S Y ++ML++R +S +++ LL K + L +GVLN + 
Sbjct: 126 ASFIYDLFVSGYAAYSPGYLLIMLVIRLISGALLAGLLGKAVSGSLAYTGVLNGMAL 182 



No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1440 

A DNA sequence (GBSxl526) was identified in S.agalactiae <SEQ ID 4419> which encodes the amino 
acid sequence <SEQ ID 4420>. Analysis of this protein sequence reveals the following: 

Possible site: 47 

>>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -6.69 Transmembrane 65 - 81 

INTEGRAL Likelihood = -6.37 Transmembrane 34 - 50 

INTEGRAL Likelihood = -6.10 Transmembrane 176 - 192 

Likelihood = -3.66 Transmembrane 130 - 146 

Likelihood = -1.97 Transmembrane 3-19 

88 - 104 



- Final Results - 

bacterial n 
bacterial outside - 



• Certainty=0 .3675 (Affirmative) 

• Certainty=0 . 0000 (Not Clear) • 



bacterial cytoplasm Certainty^O. 0000 (Not Clear) < suco 

50 A related GBS nucleic acid sequence <SEQ ID 9757> which encodes amino acid sequence <SEQ ID 9758> 
was also identified. 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

A related GBS gene <SEQ ID 8809> and protein <SEQ ID 8810> were also identified. Analysis of this 
55 protein sequence reveals the following: 
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Transmembrane 



Lipop: Possible site: -1 Crend: 8 
McG: Discrim Score: -4.09 
GvH: Signal Score (-7.5): -4.38 

Possible site: 47 
»> Seems to have no M-terminal signal sequence 
ALOM program count: 6 value: -6.69 threshold: C 
Likelihood = -6.69 Transmembrane 
Likelihood = - 
Likelihood = - 
Likelihood = - 
Likelihood = - 
Likelihood = - 
Likelihood = 
modified ALOM score: 1.84 

*** Reasoning Step: 3 

Final Results 

bacterial membrane --- Certainty=0 .3675 (Affirmative) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm' Certainty=0 . 0000 (Not Clear) < suco 

Based on this analysis, it was predicted that these proteins and their epitopes could he useful antigens for 
vaccines or d 



Example 1441 

A DNA sequence (GBSxl527) was identified in S.agalactiae <SEQ ID 442 1> which encodes the amino 
acid sequence <SEQ ID 4422>. Analysis of this protein sequence reveals the following: 

Possible site: 23 

>>> Seems to have a cleavable N-term signal seq. 



Final Results 

bacterial outside --- Certainty-0. 3000 (Affirmative) < suco 

bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

A related GBS gene <SEQ ID 881 1> and protein <SEQ ID 8812> were also identified. Analysis of this 
protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 2 
McG: Discrim Score: 6.01 
GvH: Signal Score (-7.5): 0.45 

Possible site: 23 
>>> Seems to have a cleavable N-term signal seq. 
ALOM program count: 0 value: 10.66 threshold: 0.0 
PERIPHERAL Likelihood = 10.66 80 
modified ALOM score: -2.63 

*** Reasoning Step: 3 

Final Results 

bacterial outside Certainty=0 . 3000 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 
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SEQ ID 4422 (GBS19) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 4 (lane 4; MW 24kDa). It was also expressed in E.coli as a GST-fusion product. 
SDS-PAGE analysis of total cell extract is shown in Figure 9 (lane 6; MW 46.1kDa). 

The GST-fusion protein was purified as shown in Figure 190, lane 10. 

Example 1442 

A DNA sequence (GBSxl528) was identified in S.agalactiae <SEQ ID 4423> which encodes the amino 
acid sequence <SEQ ID 4424>. Analysis of this protein sequence reveals the following: 

i cleavable N-term signal seq. 



Final Results 

bacterial outside --- Certainty=0. 3000 (Affirmative) < suco 

bacterial membrane --- Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 8813> which encodes amino acid sequence <SEQ ID 8814> 
was also identified. Analysis of this protein sequence reveals the following: 

Lipop: Possible site.- -1 Crend: 6 
SRCFLG: 0 

McG: Length of UR: 23 

Peak Value of UR: 2.61 

Net Charge of CR: 3 
McG: Discrim Score: 9.08 
GvH: Signal Score (-7.5): -0.76 

Possible site: 22 
>» Seems to have a cleavable N-term signal seq. 
Amino Acid Composition: calculated from 23 
ALOM program count: 0 value: 5.14 threshold: 0.0 
PERIPHERAL Likelihood = 5.14 365 
modified ALOM score: -1.53 

*** Reasoning Step: 3 

Rule gpol 

Final Results 

bacterial outside Certainty=0. 3000 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAA34476 GB:X16457 precursor polypeptide (AA -26 to 632) 
[Staphylococcus aureus] 
Identities = 93/372 (25%), Positives = 160/372 (43%), Gaps = 46/372 (12%) 

Query: 9 MKKQFLKSAAILSLAVTAVSTSQPVGAIVGKDETKLRQQLGYIDSKKSGKKIDBRWGEKI 68 

MKKQ + A L++A + + AXV KD+K + + KG+ + +KI 
Sbjct: 1 MKKQIISLGA-IAVASSLFTWDNKADAIVTKDYSK ESRVNEKSKKGATVSDYYYWKI 56 

Query: 69 YNYLSYELIEANEWINRSEFQEPEYRTILSEFKDKIDSIEYYLINLS NIAKEDAHQ 124 

+ L + A + + ++ +P Y+ ++ + YL+ + K+ 

Sbjct: 57 IDSLEAQFTGAIDLLENYKYGDPIYKEAKDRLMTRVLGEDQYLLKKKIDEYELYKKWYKS 116 

Query: 125 RNILQSLDKYEKSGIYNLDQGVYNYIYQEISSAKHKFSDGVDKIYRLDSTLFPFSVWYDK 184 

N ++ + K +YNL YN 1+ + A ++F+ V +1 + L F 
Sbjct: 117 SNKNTOMLTFHKYI^YNLTMNEYNDIFNSLKDAVYQFNI<EVKEIEHKNVDLKQF 170 

Query: 185 HLDNNDNYKDNKDFKEYIALLNEITRKARLGYQIVNNHKD-GEHKDEAEI-LDILIRDIT 242 
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Sbjct: 171 DKDGEDKATKEVYDLVSEIDTLWTYYA DKDYGEHAKELRAKLDLILGDTD 221 

Query: 243 FVSKBAPGYKYI PNKRIAAKI IEDLDGI INDFFKNTGKDKP-SLEKLKDTEFHKKYLNST 301 
5 K I N+RI ++I+DL+ II+DFF T +++P S+ K T+ + K + 

Sbjct: 222 NPHK ITNERIKKEMIDDLNSIIDDFFMETKQNRPNSITKYDPTKHNFKEKSEN 274 

Query: 302 EPYSIETNLPSNYKELKEKQIKKLEYGYK-KSSKIY- -TSAHYALYSEEIDAAKELLQKV 358 
+P N +E K K +K+ + +K K+ K Y T + EE + Ii KV 

10 Sbjct: 275 KP NFDKLVEETK-KAVKEADESWKNKTVKKYEETVTKSPVVKEEKKVEEPQLPKV 328 

Query: 359 KIAKDNYNEIKS 370 

N E+K+ 
Sbjct: 329 GNQQEVKT 336 

15 

No corresponding DNA sequence was identified in S.pyogenes. 

SEQ ID 8814 (GBS119) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 29 (lane 2; MW 84.3kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 35 (lane 5; 2 bands). 

20 The GBS119-GST fusion product was purified (Figure 109A; see also Figure 201, lane 6) and used to 
immunise mice (lane 1+2+3 product; 20ug/mouse). The resulting antiserum was used for Western blot, 
FACS (Figure 109B), and in the in vivo passive protection assay (Table III). These tests confirm that the 
protein is immunoaccessible on GBS bacteria and that it is an effective protective immunogen. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
25 vaccines or diagnostics. 



Example 1443 

A DNA sequence (GBSxl529) was identified in S.agalactiae <SEQ ID 4425> which encodes the amino 

acid sequence <SEQ ID 4426>. This protein is predicted to be s-adenosylmethionine synthetase (metK). 

Analysis of this protein sequence reveals the following: 

30 Possible site: 41 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 3609 (Affirmative) < suco 

35 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) c suco 



The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAB07019 GB:AP001518 S-adenosylmethionine synthetase [Bacillus halodurans] 
Identities = 266/390 (68%), Positives = 324/390 (82%), Gaps = 1/390 (0%) 

Query: 4 RKLFTSESVSEGHPDKIADQISDAILDAILEQDPDAHVAAETAVYTGSVHVFGEISTTAY 63 

R+LFTSESV+EGHPDKI DQISD+ILD 1L++DP+A VA ET+V TG V V GEI+T+ Y 
Sbjct: 7 RRLFTSESVTEGHPDKICDQISDSIIiDEILKEDENARVACETSVTTGLVLVAGEITTSTY 66 

Query: 64 VDINRVVRNTIAEIGYDKAEYGFSAE3VGVHPSLVEQSPDIAQGVNEALEVR-GSLEQDP 122 

VDI +WR+TI IGY +A+YGF +E+ V S+ EQSPDIAQGVN+ALE R G + 
Sbjct: 67 VDIPKVVRDTIFJIIGYTRAKYGFDSETCAvXjTSIDEQSPDIAQGVNQALEAREGQMTDAE 126 

Query: 123 LDLIGAGDQGLMFGFAVDETPELMPLPISLAHQLVKKLTDLRKSGELTYLRPDAKSQvTV 182 

++ IGAGDQGLMFG+A +ETPELMPLPI SL+H+L ++L++ RK L YLRPD K+QVTV 
Sbjct: 127 IFAIGAGDQGLMFGYANl^TPELMr^PISLSHKIiARRLSFJU?KGEILPYLRPDGKTQVTV 186 



Query: 183 EYDENDQPIRVDAWISTQHDPN\'TNDQLHKDVIEKVINEVIPSHYLDDQTKFFINPTGR 242 
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Sbjct: 



Query: 3 03 IiAKJCVEVQLAYAIGVAQPVSVRVDTFGTGVIAEADLEAAVRQIFDLRPAGIINMLDLKRP 3 62 

LA K EVQIAYAIGVA+PVS+ +DTFGTG ++EA L VR+ FDLRPAGII MLDL+RP 
Sbjct: 307 IADKCEVQLAYAIGVAKPVSISIDTFGTGQVSEARLVELVREHFDLRPAGIIKMLDLRRP 366 

Query: 363 IYRQTAAYGHMGRTDIDLPWERVDKVQALK 392 

IY+QTAAYGH GRTD++LPWE+ DK + L+ 
Sbjct: 367 IYKQTAAYGHFGRTDVELPWEQTDKAEILR 396 

A related DNA sequence was identified in S.pyogenes <SEQ ID 4427> which encodes the amino acid 
sequence <SEQ ID 4428>. Analysis of this protein sequence reveals the following: 

Possible site: 43 

>>> Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm --- Certainty=0 .3389 (Af fii 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 333/395 (84%) , Positives = 361/395 (91%) , Gaps = 1/395 (0%) 

MSERKLFTSESVSEGHPDKIADQISDAILDAILEQDPDAHVARETAVYTGSVHVFGEIST 6 0 
MSERKLFTSESVSEGHPDKIADQISDAILDAIL +DP+AHVAAET VYTGSVHVFGEIST 
MSERKLFTSESVSEGHPDKIADQISDAILDAILAEDPEAHVAAETCVYTGSVHVFGEIST 6 0 

TAYVDINRVVROTIAEIGYDKAEYGFSAESVGTCPSLVEQSPDIAQGVNEALEVRGSLEQ 12 0 
TAY+DINRWR+TIAEIGY +AEYGFSAESVGVHPSLVEQS DIAQGVNEA E R + 



D L IGAGDQGLMFGFA++ETPELMPLPISL+EQLV++L +LRKSGE++YLRPDAKSQV 



30 




1 




Sbjct: 


1 






61 


35 


Sbjct: 


61 




Query: 


121 


40 


Sbjct: 


120 






181 
180 


45 


Sbj ct: 


241 




Sbjct: 


240 


50 




301 




Sbjct: 


300 






361 


55 


Sbjct: 


360 



GRWIGGPQGDSGLTGRKIIVDTYGGYSRHGGGAFSGKDATKVDRSASYAARYIAKN+\ 



A L K EVQLAYAIGVAQPVSVRVDTFGT + EA LEAAVRQ+ FDLRPAGII MLDLK 



RPIY+QTAAYGHMGRTDIDLPWER++KV AL + H 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 
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Example 1444 

A DNA sequence (GBSxl530) was identified in S.agalactiae <SEQ ID 4429> which encodes the amino 
acid sequence <SEQ ID 443 0>. This protein is predicted to be a transcriptional repressor of the biotin 
operon. Analysis of this protein sequence reveals the following: 

5 Possible site: 24 

>» Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -0.16 Transmembrane 1B8 - 204 ( 188 - 204) 

Final Results 

10 bacterial membrane Certainty=0. 1055 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < auco 

A related GBS nucleic acid sequence <SEQ ID 9755> which encodes amino acid sequence <SEQ ID 9756> 
15 was also identified. 



The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAB05404 GB:AP001512 transcriptional repressor of the biotin 
operon [Bacillus halodurans] 
Identities = 102/315 (32%), Positives = 169/315 (53%), Gaps = 18/315 (5%) 



Sbjct: 
Query: 
Sbj, 



10 ILSKNNNFISGETMANQIjNISRTAIWKGIKTLEELGLEIESVTNKGYRLVSG-DILLPEQ 68 

+ L+ ++F+SGE ++ + SRTA+WK 1+ L + G E+E+V KGYR+V D + P 
9 LLTAGDDFVSGEKISQAIGCSRTAVWKHIEELRKSGYEvEAVQRKGYRIVKRPDQIKPHD 68 

69 LE QEIGIKVSIiNNNSASTQLDAKMGIESKLKTPHLFLAPNQKKAKGRFDRPFFTS 123 

++ + G +++ ++ASTQ A + K H+ LA Q KGR R +++ 

69 IQVVLETERFGREITYLESTASTQTVALKLAQEGAKEGHIvIjRNF.QTSGKGRMGRGWYSP 128 



: 124 NQGGIYMSLLLQPNVPIEDIKPYTVMVASSAVKAISRLTGITPEIKWVNDIYLDNKKIAG 183 
I MS++ +P +P + T+4- A + V+AI TG+ +IKW ND+ +D KKI G 

Sbjct: 129 PGSSISMSIIFRPQLPPQKAPQLTLLTAVAIVRAIKETTGLDSDIKWPNDLLIDGKKIVG 188 

Query: 184 ILTEAIASVESGLVTNVIIGLGINFYIKE--FPRALTKRAGSLFTEQ-PTITRNQLITEI 240 

ILTE A +S V 4VI G+GIN +E F + K A SL ++ I R LI I 
Sbjct: 189 ILTEMQADQDS- -VHSVIQGIGINVNHQEEAFAEEIRKIATSLAIKKGEPIQRAPLIAAI 246 

Query: 241 W NLFFNI PLEDHLK VYREKSLVLDRTVSFMDGQTMYSGKAIDITDKGYLWEL 293 

LF+++ L+ ++++ + + + + G A ITD G L++E 

Sbjct: 247 LKN1ELFYDLYLQHGFSRIKPLWEAHAISIGKRIRARMLNDVKFGVAKGITDDGVLLLED 306 

Query: 294 DDGQLKTLRSGEI SL 308 

DDG+L ++ S +1 + 
Sbjct: 3 07 DDGKLHS1YSADIEI 321 

A related DNA sequence was identified in S. pyogenes <SEQ ID 443 1> which encodes the amino acid 
sequence <SEQ ID 4432>. Analysis of this protein sequence reveals the following: 

Possible site: 34 
»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -1.49 Transmembrane 194 - 210 ( 194 - 211) 

Final Results 

bacterial membrane Certainty=0 . 1595 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the databases: 

>GP:BAB05404 GB:AP001512 transcriptional repressor of the biotin 
operon [Bacillus halodurans] 
Identities = 98/315 (31%) , Positives = 165/315 (52%) , Gaps = 18/315 (5%) 
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Query: 10 LLSQTDDFVSGEYLADQLS I SRTSVWKS I KSLENQGI QI DSLKHKGYRMVQG - DI LLPKT 68 

LL+ DDFVSGE ++ + SRT+VWK 1+ L G ++++++ KGYR+V+ D + P 

Sbjct: 9 LLTAGDDFVSGEKISQAIGCSRTATOIffllEELRKSGYEVEAVQRKGYRIVKRPDQIKPHD 68 

Query: 69 I SQGLGMPVTYTPHSQSTQLDAKQGIFAHNSAPRLY1L&PSQEAAKGRLDRQFFSA 123 



Query: 124 STGGI YMSMYLKPNVPYADMPPYTMMVAS S I VKAI SRLTGIDTE I KWVND I YLGNHKVAG 183 

I MS+ +P +P P T++ A +IV+AI TG+D++IKW ND+ + K+ G 
Sbjct: 129 PGSSISMSIIFRPQLPPQKAPQLTLLTAVAIVRAIKETTGLDSDIKWPNDLLIDGKKIVG 188 

Query: 184 ILTEAITSVETGLITDVI IGVGLNFFVTD - - FPEAIAQKAGSLFTEK-PTITRNDLI IDI 240 

ILTE + + VI G+G+N +FEI+ASL+K I R LI I 

Sbjct: 189 I LTE - - MQADQDS VHSVI QGIGINVNHQEEAFAEE I RKIATSLAI KKGEP I QRAPL I AAI 246 

Query: 241 WK LFTjSIPVKDHVKVYKEKSLVLNKQVTFIENSQEKRAIAIDLTDQGHLIVQF 293 

K L+L +++ ++ + K++ + K +A +TD G L+++ 

Sbjct: 247 LKNIELFYDLYLQHGFSRIKPLWEAHAISIGKRIRARMLNDVKFGVAKGITDDGVLLLED 306 

Query: 294 ENGDLQTLRSGEISL 308 

++G L ++ S +1 + 
Sbjct: 307 DDGKLHS I YSADIEI 321 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 191/311 (61%) , Positives = 257/311 (82%) 
Sbjct: 1 

GDILLP+ + Q +G+ V+ +S STQLDAK GIE+ P L+LAP+Q+ AKGR DR F 
Sbjct: 61 GDILLPKTISQGLGMPVTYTPHSQSTQLDAKQGIEAHNSAPRLYLAPSQEAAKGRLDRQF 120 

Query: 121 FTSNQGGIYMSLLLQPWPIEDIKPYTVM7ASSAVKAISRLTGITPEIKWVNDIYLDNKK 180 

F+++ GGIYMS+ L+PNVP D+ PYT+MVASS VKAISRLTGI EIKWVMDIYL N K 
Sbjct: 121 FSASTGGIYMSMYLKPNVPYADMPPYTMMVASSIVKAISRLTGIDTEIKWVNDIYLGNHK 180 

Query: 181 IAGILTEAIASVESGLVTNVIIGLGINFYIKEFPRALTKRAGSLFTEQPTITRNQLITEI 240 

+AGILTEAI SVE+GL+T+VI IG+G+NF++ +FP A+ ++AGSLFTE+PTITRN LI +1 
Sbjct: 181 VAGILTEAITSVETGLITDVIIGVGLNFFVTDFFEAIAQKAGSLFTEKPTITRNDLIIDI 240 

Query: 241 WNLFFNIPLEDHLKVYREKSLVLDRTVSFMDGQTMYSGKAIDITDKGYLVVELDDGQljKT 300 

W LF +IP++DH+KVY+EKSLVL++ V+F++ AID+TD+G+L+V+ ++G L+T 

Sbjct: 241 WKLFLSIPVITOHVKVYKEKSLVLNKQVTFIENSCEKRAIAIDLTDQGHLIVQFENGDLQT 300 

Query: 301 LRSGEISLSSW 311 

LRSGEISLSSW 
Sbjct: 301 LRSGEISLSSW 311 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1445 

A DNA sequence (GBSxl531) was identified in S.agalactiae <SEQ ID 4433> which encodes the amino 
acid sequence <SEQ ID 4434>. Analysis of this protein sequence reveals the following: 

Possible site: 15 

»> Seems to have an uncleavable N-term signal seq 

Likelihood = -2.76 Transmembrane 3 - 19 ( 3-20) 



■ Final Results - 
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bacterial membrane Certainty=0 . 2105 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1446 

A DNA sequence (GBSxl532) was identified in S.agalactiae <SEQ ID 4435> which encodes the amino 
acid sequence <SEQ ID 4436>. Analysis of this protein sequence reveals the following: 

Possible site: IS 

»i Seems to have a cleavable N-term signal seq. 

INTEGRAL Likelihood = -2.28 Transmembrane 24 - 40 ( 24 - 40) 



Final Results 

bacterial membrane Certainty=0 . 1914 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

20 

The protein has no significant homology with any sequences in the GENPEPT database. 

A related DNA sequence was identified in S.pyogenes <SEQ ID 4437> which encodes the amino acid 

sequence <SEQ ID 443 8>. Analysis of this protein sequence reveals the following: 

Possible site: 49 
25 >>> Seems to have a cleavable N-tei 

INTEGRAL Likelihood = -1.91 

Final Results ' '• 

bacterial membrane Certainty=0 . 1765 (Affirmative) < suco 

30 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 37/67 (55%) , Positives = 54/67 (80%) , Gaps = 3/57 (4%) 

35 

Query: 1 MTKRQFIFMALLCSFETYFraQSVMDGSWIFAIFWGVDLLRDLQKVYAISKFTKELIK- - 58 

MT RQF+FMA +C+FETYFFN ++ G+++FA+FWG+LL RDL++V+ I++ TK ++K 
Sbjct: 36 MTIRQFLFMAFVCAFETYFFNDLLLSGNYLFALFWGLLLFRDLRRVHTINQLTKTILKTA 95 

40 Query: 59 -STKKKD 64 

S KKKD 
Sbjct: 96 NSPKKKD 102 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
45 vaccines or diagnostics. 

Example 1447 

A DNA sequence (GBSxl533) was identified in S.agalactiae <SEQ ID 4439> which encodes the amino 
acid sequence <SEQ ID 4440>. This protein is predicted to be DNA polymerase III, gamma subunit 
(dnaZX). Analysis of this protein sequence reveals the following: 

I- terminal signal sequence 
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Final Results 

bacterial cytoplasm Certainty=0. 1567 (Affirmative) suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

A related DNA sequence was identified in S.pyogenes <SEQ ID 4441> which encodes the amino acid 
sequence <SEQ ID 4442>. Analysis of this protein sequence reveals the following: 

Possible site: 60 

>>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -0.59 Transmembrane 232 - 248 ( 232 - 249) 

Final Results 

bacterial membrane Certainty=0. 1235 (Affirmative) ^ suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown helow. 

Identities = 408/558 (73%) , Positives = 473/558 (84%) , Gaps = 6/558 (1%) 

Query: 1 MYQALYRKYRSQTFDEMVGQSVISTTLKQAVSSKXISHAYLFSGPRGTGKTSAAKIFAKA 60 

MYQALYRKYRSQTFDEMVGQSVISTTLKQAV S KI SHAYLFSGPRGTGKTSAAKI FAKA 
Sbjct: 1 MYQALYRKYRSQTFDEMVGQSVISTTLKQAVESGKISHAYLFSGPRGTGKTSAAKIFAKA 60 

Query: 61 MNCPNQINGEPCNHCDICRDITNGSLEDVIEIDAASNNGVDEIRDIRDKSTYAPSRATYK 120 

MNCPNQ+ +GEPCN CDICRDITNGSLEDVIEIDAASNNGVDEIRDIRDKSTYAPSRATYK 
Sbjct: 61 MNCPNQVDGEP<2NQCDICRDITNGSLEDVIEIDAASNNGVDEIRDIRDKSTYAPSRATYK 120 

Query: 121 VYIIDEVHMLSTGAFNALLKTLEEPTENvVFIIjATTEIjHKIPATILSRVQRFEFKAIKLL 180 

VYI IDEVHMLSTGAFNALLKTLEEPTENWFILATTELHKI patilsrvqrfefkaik 
Sbjct: 121 VYIIDEVHMLSTGAFNALLKTLEEPTENWFILATTELHKIPATILSRVQRFEFKAIKQK 180 

Query: 181 AIRDHLAQILDKEAISYDLDALTLVARRAEGGMRDALSILDQALSLAKDNHISLDVAEEI 240 

AIR+HLA +LDKE I+Y++DAL L+ARRAEGGMRDALSILDQALSL+ DN +++ +AEEI 
Sbjct: 131 AIREHIjAWVLDKEGIAYEVDALNLIARRAEGGPKDALSILDQALSLSPDNQVAIAIAEEI 240 

Query: 241 TGSISLSAIDDYVSNILAHDTTEALAKLEVIFDSGKSMSRFATDLLMYLRDLLWQAGGE 300 

TGSIS+ A+ DYV + T+ALA LE I+DSGKSMSRFATDLL YLRDLLW+AGG+ 

Sbjct: 241 TGSISIIALGDYWWSQEQATQAIiAALETIYDSGKSMSRFATDLLTYIjFJ^LLVVKAGGD 300 

Query: 301 DSHSSDTFIANLNWQDILFEMIDKOTSvLPEIKNGSHPKVYAEMMTIQLSE^WEKNSS- 359 
+ S F NL++ D +F+MI VTS LPEIK G+HP4+YAEMMTIQL++ + S 
' Sbjct: 301 NQRQSAVFDTNLSLSIDRIFQMITWTSHLPEIKKGTHPRIYAEMMTIQLAQKEQILSQV 360 

Query: 360 NIPADVTAELDSLPJlELKSLKNEMSQL-SPJffiQSSSTQKVKVNNKTFTFKvDRTKILTIM 418 

N+ ++ +E+++L+ EL LK ++SQL SR D + + K K KT +++VDR IL IM 
Sbjct: 361 NLSGELISEIETLKNELAQLKQQLSQLQSRPDSLARSDKTK--PKTTSYRVDRVTILKIM 418 

Query: 419 EETVVDSQRSREYLEALKSAWNEILDNITAQDRALLMGSEPVLANSENAILAFDAAFNAE 478 

EETV +SQ+SR+YL+ALK+AWNEILDNI +AQDRALLMGSEPVLANSENAILAF+AAFNAE 
Sbjct: 419 EETvPJSTSQQSRQYLDALKNAWNEILDNISAQDRALLMGSEPVLANSENAILAFEAAFNAE 478 

Query: 479 QAMKRTDIWIFGNIMSKARGFSPNI1LAVPRNDFNQIRSDFAKKMKAQK--TETEPEVNH 536 

Q M R +LND+FGNIMSKAAGFSPNILAVPR DF IR +FA++MK+QK + E EV 
Sbjct: 479 C 

Query: 537 QIPEDFSYLAERIAIVED 554 

IPE F +L ++I ++D 
Sbjct: 539 DIPEGFDFLLDKINTIDD 556 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 
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Example 1448 

A DNA sequence (GBSxl534) was identified in S.agalactiae <SEQ ID 4443> which encodes the amino 
acid sequence <SEQ ID 4444>. Analysis of this protein sequence reveals the following: 

Possible site: 40 
5 >>> Seems to have no N-terminal signal sequence (or aa 1-19) 

Final Results 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

10 bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT databe 



Query: 8 ENYQLLLLQAQALFSDETNALMILSNASAMLNAMLPNSVFTGFYLFDGEELILGPFQGGV 67 

E Y L+ Q AL E++A+ANL+NASA+L L + GFYL EL+LGPFQG 
Sbjct: 13 EKYSLVTICQLAALLEGESDAI ANLANASAIiLYH FLEEVNWVGFYL I KEGELVLGPFQGLP 72 

Query: 68 SCTHITLGKGVCGESAQTAKTLIVDDVTKHANYISCDSKAMSEIWPMFKNGKLLGVLDL 127 

+CV I 4-G+GVCG +A+ +T+ V+DV + +I+CD+ + SEIV+P+F+NG L GVLD+ 
Sbjct: 73 ACVRIPIGRGVCGTAAKEEQTVRVEDVHQFPGHIACDAASRSEIVIPLFQNGVLYGVLDI 132 

Query: 128 DSSLVADYDEIDQEYLEKFVGIL 150 

DS + + E +Q LE FV +L 
Sbjct: 133 DSPSUIRFSEEEQATiTiRSFVDVL 155 

A related DNA sequence was identified in S.pyogenes <SEQ ID 4445> which encodes the amino acid 
sequence <SEQ ID 4446>. Analysis of this protein sequence reveals the following: 



• Final Results 

bacterial cytoplasm Certainty=0 . 1753 (Affirmative) ■ 

bacterial membrane Certainty=0 . 0000 (Not Clear) < 

bacterial outside Certainty=0 . 0000 (Not Clear) < 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 122/164 (74%) , Positives = 144/164 (87%) 

Query: 1 MNKSKKIENYQLLLLQAQALFSDETNALANIjSNASAMLNAMLPNSVFTGFYLFDGEELIL 60 

MNKSKKIE YQL++ QA+ LF++E+NALANLSNASA+LN LPNSVFTGFYLFDG+ELIL 
Sbjct: 1 MNKSKKIEQYQLMIAQAKELFANESNAIANLSNASALLNMTLPNSVFTGFYLFDGQELIL 60 

Query: 61 GPFQGGVSCvniTLGKGVCGESAQTAKTLIVDDVTKHANYISCDSKAMSEIWPMFKNGK 120 

GPFQG VSCVHI LGKGVCGESAQ+ +T+I++DV +HANYISCD+ AMSEIWPM K G 
Sbjct: 61 GPFQGRVSCVHIKLGKGVCGESAQSRRTIIINDVKQHANYISCDAAAMSEIWPMVKEGH 120 

Query: 121 LLGVLDLDSSLVADYDEIDQEYLEKFVGILVEHTIWNLDMFGVE 164 

L+GVLDLDSSLVADYDE+DQEYLE FV + +E T + +MFGV+ 
Sbjct: 121 LIGVLDLDSSLVADYDEVDQEYLEAFVDLFLEKTTFTFNMFGVK 164 

SEQ ID 4444 (GBS282) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 52 (lane 9; MW 19.8kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 60 (lane 6; MW 44.8kDa) and in Figure 
63 Cane 7; MW 47kDa). 
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The GBS282-GST fusion product was purified (Figure 21 1, lane 4; see also Figure 225, lane 6) and used to 
immunise mice. The resulting antiserum was used for FACS (Figure 269), which confirmed that the protein 
is iimnunoaccessible on GBS bacteria. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 
Example 1449 

A DNA sequence (GBSxl535) was identified in S.agalactiae <SEQ ID 4447> which encodes the amino 
acid sequence <SEQ ID 4448>. This protein is predicted to be uridine kinase (udk). Analysis of this protein 
sequence reveals the following: 

n signal seq 

Final Results 

bacterial membrane — Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside — Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty^. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB14675 GB:Z99117 uridine kinase [Bacillus subtilis] 
Identities = 133/207 (64%), Positives = 167/207 (80%) 

Query 1 MRKKPIlIGVTGGSGGGKTSVSRRILSNPPDQKITMIEfflDSYyKDQSHLTFEERVKTNYD 60 

M K P++IG+ GGSG GKTSV+R+I F I MI+ D YYKDQSHL FEER+ TNYD 
Sbjct: 1 MGKNPWIGIAGGSGSGKTSVTRSIYEQFKGHSILMIQQDLYYKDQSHLPFEERUTTNYD 60 

Query- 61 HPIAFDTNLMIEQrJffiLIEGRPVDIPVYDYTKHTRSDRTIRQEPQDVIIVEGILVLEDQR 120 

HPLAFD + +IE + +L+ RP++ P+YDY HTRS+ T+ EP+DVI1+EGILVLED+R 
Sbjct: 61 HPLAFDITOYLIEHIQDLIJ^RPIEKPIYDYKLHTRSF.ETVHVEPKDVIILEGILVLEDKR 120 

Query 121 LRDLMDIKLFVDTDDDIRIIRRIKRDMEERDRSLDSIIEQYTEWKPMYHQFIEPTKRYA 180 

LRDLMDIKL4-VDTD D+RIIRR1 RD+ ER RS+DS+IEQY W+PM++QF+EPTKRYA 
Sbjct: 121 LRDLMDIKLYVDTDADLRIIRRIMRDINERGRSIDSVIEQYVSWRPMHNQFVEPTKRYA 180 

Query: 181 DIVIPEGVSNIVAIDLINTKVASIINE 207 

DI+IPEG N VAIDL+ TK+ +IL + 
Sbjct: 181 DIIIPEGGQNHVAIDLMVTKIQTILEQ.207 

A related DNA sequence was identified in S.pyogenes <SEQ ID 4449> which encodes the amino acid 
sequence <SEQ ID 4450>. Analysis of this protein sequence reveals the following: 

■ Possible site: 39 
»> Seems to have an uncleavable W-term signal seq 



Final Results 

bacterial membrane — Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0.0000 {Not Clear) < suco 

bacterial cytoplasm --- Certalnty=0. 0000 (Not Clear) < suco 

A related sequence was also identified in GAS <SEQ ID 9151> which encodes the amino acid sequence 
<SEQ ID 9152>. Analysis of this protein sequence reveals the following: 



- Final Results 

bacterial membrane — Certainty=0 . 0000 (Not Clear) < suco 
bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 



WO 02/34771 



PCT/GB01/04789 



-1599- 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 173/207 (83%) , Positives = 193/207 (92%) 

Query: 1 MRKKPIIIGVTGGSGGGKTSVSRAILSNFPDQKITMIEHDSYYKDQSHLTFEERVKTNYD 60 

M KKPI I IGVTGGSGGGKTSVSRAIL +FP+ +1 MI +HDSYYKDQSH++ FEERVKTNYD 
Sbjct: 5 MLKKPIIIGVTGGSGGGKTSVSRAILDSFPNARISMIQHDSYYKDQSHMSFEERVKTNYD 64 

Query: 61 HPLAFDTNLMIEQLNELIEGRPVDIPVYDyTKHTRSDRTIRQEPQDVIIVEGIIjVLEDQR 120 

HPLAFDT+ MI+QL EL+ GRPVDI P+YDY KHTRS+ T RQ+PQDVIIVEGILVLED+R 
Sbjct: 65 HPLAFDTDFMIQQLKELIAGRPVDIPIYDyKKHTRSNTTFRQDPQDVIIVEGILVl.EDER 124 

Query: 121 LRDLMDIKLFVDTDDDIRI IRRIKRDMEERDRSLDSIIEQYTEWKPMYHQFIEPTKRYA 180 

LRDLMDIKLFVDTDDDIRI IRRIKRDM ER RSL+SII+QYT WKPMYHQFIEP+KRYA 
Sbjct: 125 LRDLMDIKLFVDTDDDIRI IRRIKRDMMERGRSLES I IDQYTSWKPMYHQFIEPSKRYA 184 

Query: 181 DIVIPEGVSNIVAIDLINTKVASILNE 207 

DIVIPEGVSN+VAID+IN+K+ASIL E 
Sbjct: 185 DIVIPEGVSNWAIDVINSKIASILGE 211 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1450 

A DNA sequence (GBSxl536) was identified in S.agalactiae <SEQ ID 445 1> which encodes the amino 
acid sequence <SEQ ID 4452>. Analysis of this protein sequence reveals the following: 

N-terminal signal sequence 

30 -: Final Results 

bacterial cytoplasm Certainty=0 . 5083 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

35 The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB12572 GB:Z99108 similar to RNA helicase [Bacillus subtilis] 
Identities = 140/343 (40%) , Positives = 202/343 (58%) , Gaps = 9/343 (2%) 

Query: 10 QDKLTQRQFDDLTD I QNKLFQP I TDGDNI LGI SPTGTGKTLAYLFPTLLKLQPK- KSQQL 68 
40 Q+ F T +Q + Q I DG +++ S PTGTGKTLAY P L +++P+ K Q 

Sbjct: 16 QENWNASGFQKPTPVQEQAAQLIMDGKDVIAESPTGTGKTLAYALPVLERIKPEQKHPQA 75 

Query: 69 LILAPNSELAGQIFDVTfCEWAEPLGLTAQLFLSGSSQKRQIERLKKGPEILIGTAGRVFE 128 
+ILAP+ EL QIF V ++W LA + G++ K+Q+E+LKK P I++GT GRVFE 

45 Sbjct: 76 VILAPSRELVMQIFQVIQDWKAGSELRAASLIGGANVKKQVEKLKKHPHIIVGTPGRVFE 135 

Query: 129 LVKLKKIKMMNINTIVLDEFDELLGDSQYHFVDNIINRVPRDQQMIYISATNKLDNS--- 185 

L+K KK+KM + TIVLDE D+L+ + II RD+Q++ SAT K + 

Sbjct: 136 LIKAKKLKMHEVKTIVLDETDQLVLPEHRETMKQIIKTTLRDRQLLCFSATLKKETEDVL 195 

50 

Query: 186 -KLADNTITIDLSNQKLDT--IICHYYITVDKRERTDLLRKFSNIPDFRGLVFFNSLSDLG 242 

+LA + + K + +KH Y+ D+R++ LL+K S + + LVF + +L 

Sbjct: 196 RELAQEPEVLKVQRSKAEAGKVKHQYLICIX2RDKvKIi^ 255 

55 Query: 243 ACEERLQFNRASAVSLASDINIKFRKVILEKFKNHDISLLLGTDLVARGIDIDNLEYVIN 302 

E+L ++ h S+ R 1+ F++ + LLL TD+ ARG+DI+NL YVI + 

Sbjct: 256 VYAEKIAYHHVELGVLHSEAKKMERAiaiATFEDGEFPLLLATDIAARGLDIENLPYVIH 315 

Query: 303 FDIARDKETYTHRSGRTGRMGKEGCVITFVTHKEELKQLKKYA 345 
60 DI D++ Y HRSGRTGR GKEG V++ VT EE K LKK A 

Sbjct: 316 ADIP-DEDGYVHRSGRTGRAGKEGNVLSLVTKLEESK-LKKMA 356 
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A related DNA sequence was identified in S.pyogenes <SEQ ID 4453> which encodes the amino acid 
sequence <SEQ ID 4454>. Analysis of this protein sequence reveals the following: 



Possible site: 39 

>>> Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0 .3847 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 273/358 (76%) , Positives = 312/358 (86%) 

Query: 1 MITKFPDQWQDKLTQRQFDDLTDIQNKLFQPITDGDNILGISPTGTGKTLAYLFPTLLKL 60 

MITKFP QWQ+KL Q F LT IQ + FQPI DG N LGISPTGTGKTLAY+FP LL L 
Sbjct: 12 MITKFPPQWQEKLDQVAFTHLTPIQEQAFQPIVDGKUFLGISPTGTGKTLAYVFPNLLAL 71 

Query: 61 QPKKSQQLLILAPNSELAGQIFDVTKEWAEPLGLTAQLFLSGSSQKRQIERLKKGPEILI 120 

PKKSQQLLILAPN+ELAGQIF+VTK+WA+PLGLTAQLF+SG+SQKRQIERLKKGPEILI 
Sbjct: 72 TPKKSQQLLIIAPNTEIAGQIFEVTKDWAQPLGLTAQLFISGTSQKRQIERLKKGPEILI 131 

Query: 121 GTAGRVFELVKLKKI KMMNINTI VLDEFDELLGDSQYHFVDNI INRVPRDQQMI YI SATN 180 

GT GR+FEL+KLKKIKMM++NTIVLDE+DELLGDSQY FV I + VPRD QM+Y+SATN 
Sbjct: 132 GTPGRIFELIKLKKIKMMSVNTIVLDEYDELLGDSQYDFVQKISHYVPRDHQMVYMSATN 191 

Query: 181 KLDNSKLADNTITIDLSNQKLDTIKHYYITVDKRERTDLLRKFSNIPDFRGLVFFNSLSD 240 

K+D + LA NT 1DLS Q D I+H+Y+ VDKRERTDLLRKF+NIP FR LVFFNSLSD 
Sbjct: 192 KVDQTSIAPNTFCIDLSEQTNDA1QHFYLMVDKRERTDLLRKFTNIPHFRALVFFNSLSD 251 

Query: 241 LGACEERLQFNRASAVSLASDINIKFRKVILEKFKNHDISLLLGTDLVARGIDIDNLEYV 300 

LGA EERLQ+N A+AVSIASDIN+KFRK ILEKFK+H +SLLL TDLVARGIDIDNL+YV 
Sbjct: 252 LGATEERIiQYNGAAAVSIASDINVKFRKTILEKFKSHQLSLLLATDLVARGIDIDNLDYV 311 

Query: 301 INFDIARDKETYTHRSGRTGRMGKEGCVITFVTHKEELKQLKKYATVTELVLHNQKLH 358 

I+FD+ARDKE YTHR+GRTGRMGK G VITFV-i-H E+LK+LKK+A V+E+ L NQ+LH 
Sbjct: 312 IHFDVARDKENYTHRAGRTGRMGKSGIVITFVSHPEDLKKLKKFAKVSEISLKNQQLH 369 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1451 

A DNA sequence (GBSxl537) was identified in S.agalactiae <SEQ ID 4455> which encodes the amino 
acid sequence <SEQ ID 445 6>. Analysis of this protein sequence reveals the following: 

Possible site: 25 

»> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood = -1.38 Transmembrane 15 - 31 ( 13 - 31) 

Final Results 

bacterial membrane Certainty=0 . 1553 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 
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Example 1452 

A DNA sequence (GBSxl538) was identified in S.agalactiae <SEQ ID 4457> which encodes the amino 
acid sequence <SEQ ID 445 8>. This protein is predicted to be peptidoglycan GlcNAc deacetylase. Analysis 
of this protein sequence reveals the following: 

5 Possible site: 28 

>>> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood = -8.92 Transmembrane 4 - 20 ( 1 - 26) 



Pinal Results 

10 bacterial membrane Certainty=0 .4567 (Affirmative) < succ; 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB96552 GB:AJ2S1472 peptidoglycan GlcNAc deacetylase 
[Streptococcus pneumoniae] 
Identities = 133/431 (30%), Positives = 228/431 (52%), Gaps = 20/431 (4%) 

I IGIFSLI I IAILAWQGFSFLKHK- -EIKLQQAWEKEIRIAEKTVEWKRQKTERVLFL 62 
+ IGI ++ I + + F + K E K++ EK+ +++E + RQ V+ 
LIGILAISICLLGGFIAFKIYQQKSFEQKIESLKKEKDDQLSEGNQKEHFRQGQAEVIAY 80 

EPKGYDKSLSADILKWNQKSFEHKKFYDNQYI ILRPQLADSNFANVKKLSIYQILYQKEK 122 

P +K +S+ NQ + + DN Q +S V ++ + +Y 

YPLCGEKVISSTOELINQDVKDKLESKrmVFYYTEQ-EESGLKGVVNRNVTKQIYDLVA 139 

GSMFQKSSRLLRTYLLDQNKKPFELDELLAHNISGFKAILENIAPGTQLK- -EHDSNKEF 180 







Sbjct: 


21 




63 


Sbjct: 


81 






Sbjct: 


140 


Query: 


181 


Sbjct: 


200 


Query: 


228 


Sbjct: 


260 




286 


Sbjct: 


320 


Query: 


346 


Sbjct: 


380 




406 


Sbjct: 


440 



GH +GNH+W t 



L ++W YD+ DWK4 N +++T 1+ Q+ G ++LMHDIH T++ALP ■) 



A related DNA sequence was identified in S.pyogenes <SEQ ID 445 9> which encodes the amino acid 
sequence <SEQ ID 4460>. Analysis of this protein sequence reveals the following: 

Possible site: 22 
>>> Seems to have an uncleavable N-term signal seq 
INTEGRAL Likelihood =-12.58 



- Final Results 

bacterial membrane Certainty=0 .6031 (Affirmative) • 

bacterial outside Certainty=0 .0000 (Not Clear) < s 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < : 
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The protein has homology with the following sequences in the databases: 

!GB:AJ251472 peptidoglycan GlcNAc deacetylase [Strep... 239 4e-62 

>GP:CAB96552 GB:AJ251472 peptidoglycan GlcNAc deacetylase 
[Streptococcus pneumoniae] 
Identities = 136/438 (31%) , Positives = 230/438 (52%) , Gaps = 23/438 (5%) 



Query: 


3 


Sbjct: 


13 


Query: 


59 


Sbjct: 


73 


Query: 


117 


Sbj Ct: 


133 


Query: 


174 






Query: 


223 


Sbjct: 


253 


Query: 


283 


Sbjct: 


310 




343 


Sbjct: 


370 




403 


Sbjct: 


430 



• -ERIQEEVEKKYPDAGFN 173 



K+VALTFDDGP+P TTPQVL+ LAKY K TFF++G V 



j KR+ GH + NH+W HP L+ LS-i- E -i- Q+ T 



L+ ++W VD+ DW++ + I+T +++Q+ G +VLMHDIH T+NALP 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 169/420 (40%) , Positives = 259/420 (61%) , Gaps = 12/420 (2%) 

Query: 4 LIIGIFSLIIIAILAWQGFSFLKHKEIKLQQAVWKEIRIAEKTVEVVKRQKTER-- VLF 61 

+++G+ S+++++ LA + K E + + EK+ ++ ++ VK K ++ + 
Sbjct: 7 ILVGLLSILMLS-IAI VFIITOWKLNEDSQRIVLAEKIO^ 65 

Query:, 62 LEPKGYDKSLSADILKWNQKSFEHKKFYDNQYIILRPQLftDSNFANVKKLSIYQILYQKE 121 

P D L S KK D + I++RP+L S+ +V L+I +I+YQK+ 

Sbjct: 66 FSPIKQADDFFVDNLP VSLYKKKNSDKEL1ILVRPKLQSSHLRSVNTL1TISKIVYQKK 122 



Query: 182 KTGRVTDGLDVKDGKLIIlSro-LKIjPLDKLYWVIDESYLKSSDLDIiVSNL KAKAPR-- 235 

++DG +VK G LI + L +PL L++VI+ +L +SD N K + P+ 

Sbjct: 183 SNSLLSDGFEVKSGNLIFDKKLTIPLTTLFDVIOTDFIiANSDRAAYDNYRTYKEQHPKKL 242 

Query: 236 VALTFDDGPNEKTTPKALEILKRYNAKATFFVMGQSAVGHTDILQRMHAEGHEIGNHTWD 295 

VALTFDDGP+ TTP+ L+IL +Y AK TFF++G V + ++ +R+ GHEI NHTWD 
Sbjct: 243 VALTFDDGPDPTTTPQVLDILAKYQAKGTFFMIGSKVVNNENLTKRVSDAGHEIANHTWD 302 

Query: 296 HPNLTKLPAEKIKEEIHKTNDLIMKATGQKPVYLRPPYGATNATVKTVTGLKEMLWSVDT 355 

HPNLT L +1+ +++ TN I KA G+KP YLRPPYGATNATV+ +GL +MLW+VDT 
Sbjct: 3 03 HPNLTI^SVSEIQHQVmiWQAIEKACGKKPRYLRPPYGATNATVQQSSGLTQMLWTVDT 362 
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Query: 356 EDWKMHOTQAMMTNIICKQLRPGGVILMHDIHQTTIDALFTIMDYLTTQGYYFVTVGELYS 415 

DW+NH+T +MTN+K QL+PGGV+LMHDIHQTTI+ALET+M+YL +GY VTV ELY+ 
Sbjct: 3 S3 RDVffilfflSTDGIMTIWKNQLQPGGVVLMHDIHQTTINALPTVMEYLKAEGYECVTVSELYA 422 

GBS281d was expressed in E.coli as a GST-fusion product SDS-PAGE analysis of total cell extract is 
shown in Figure 152 (lane 8-10; MW 71.5kDa) and in Figure 187 (lane 10; MW 71kDa). It was also 
expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell extract is shown in Figure 152 
(lane 12; MW 46.5kDa) and in Figure 183 (lane 2; MW 46kDa). Purified GBS281d-GST is shown in lane 6 
of Figure 237. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1453 

A DNA sequence (GBSxl539) was identified in S.agalactiae <SEQ ID 4461> which encodes the amino 
acid sequence <SEQ ID 4462>. Analysis of this protein sequence reveals the following: 

o N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2488 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

A related DNA sequence was identified in S.pyogenes <SEQ ID 4463> which encodes the amino acid 
sequence <SEQ ID 4464>. Analysis of this protein sequence reveals the following: 

d N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2799 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 311/475 (65%) , Positives = 389/475 (81%) 

Query: 1 MTKEYQNYVNGEWKSSVNQIEILSPIDDSSLGFVPAMTREEVDHAMKAGREALPAWAALT 60 

+ K+Y+N VNGEWK S N+I I +P LG VPAMT+ EVD + ++AL W AL+ 

Sbjct: 1 LAKQYKNLWGEWKLSElTOITIYAPATGEELGSVPAI^TQAEVDAVi'ASAKK7ALSDWRALS 60 

Query: 61 WERAQYLHKAM3IIERDKEEIATVTAKEISKAYNASVTEWRTADLIRY7AAEEGIRLST 120 

ERA YLHKAADI+ RD E+I +L+KE++K + A+V+EV+RTA++I YAAEEG+R+ 
Sbjct: 61 YTORAAYLHKAADILVRDAEKIGAILSKEVAKGHKAAVSEVIRTAEIINYAAEEGLRMEG 120 



Sbjct: 121 EV1EGGSFFAASKKKIAIWREPVGLVIAISPPNYPVNLAGSKIAPALIRGNWALKPPT 180 

Query: 181 QGSVSGLVIAKAFAEAGLPAGVFNTITGRGSEIGDYIVEHEEVNFINFTGSTPVGKRIGK 240 

QGS+SGL+IA+AFAEAG+PAGVFNTITGRGS IGDYIVEHE V+FINFTGSTP+G+ IGK 
Sbjct: 181 CGSISGLLLAEAFAEAGIPAGVFNTITGRGSVIGDYIVEHEAVSFINFTGSTPIGEGIGK 240 

Query: 241 IAGMRPIMLEIjGGKDAGWIADADLDNAAKQIVAGAYDYSGQRCTAIKRVLVvEEVADEL 300 

LAGMRPIMLELGGKD+ +VL DADL AAK IVAGA+ YSGQRCTA+KRVLV+++VAD+L, 
Sbjct: 241 IAGMRPIMLELGGKDSAIVLEDADLAIAAKNIVAGAFGYSGQRCTAVKRVLVM^ 300 

Query: 301 AEKISEWAKDSVGDPFDNATVTPVIDDNSADFIESLVVDARQKGAKEIiNEFKRDGRLLT 360 
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Query: 361 PGLFDHVTLDMKIAWEEPFGPILPIIRVKDAEE^-VAIANKSDFGLQSSVFTRDFQKAFDI 420 
5 P LFDHVT DM+LAWEEPFGP+LPIIRV EEA+ I+N+S++GLQ+S+FT +F KAF I 

Sbjct: 361 PVLFDHVTTDMRLAWEEPFGPVLPIIRVTTVEEA1KISNESEYGLQASIFTTNFPKAFGI 420 

Query: 421 ANiCLEVGTVHINNKTGRGPDNFPFLGLKGSGAGVQGIRYEIEAMTNVKSIVFDMK 475 
A +LEVGTVH+NNKT RG DNFPFLG K SGAGVQG++YSIEAMT VKS+VFD++ 
10 Sbjct: 421 AEQLEVGTVHLNNKTQRGTDNFPFLGAKKSGAGVQGVKYSIEAMTTVKSWFDIQ 475 

A related GBS gene <SEQ ID 8815> and protein <SEQ ID 8816> were also identified. Analysis of this 
protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 3 
15 McG: Discrim Score: -15.11 

GvH: Signal Score (-7.5): 0.17 

Possible site: 57 
>>> Seems to have i 
ALOM program cour 
20 PERIPHERAL Likelihood = 1.22 187 

modified ALOM score: -0.74 

*** Reasoning Step: 3 

25 Final Results 

bacterial cytoplasm Certainty=0 . 2488 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

30 The protein has homology with the following sequences in the databases: 

66.8/82.6% over 474aa 

Streptococcus mutans 

EGAD|42413| NADP-dependent glyceraldehyde-3-phosphate dehydrogenase Insert characterized 
EGAD 1 42413 1 110509 NADP-dependent glyceraldehyde-3-phosphate dehydrogenase Insert 
35 characterized 

SP|Q5993l|GAPN_STRMU NADP-DEPENDENT GLYCERALDEHYDE - 3 - PHOS PHATE DEHYDROGENASE (EC 1.2.1.9) 
(NON-PHOSPHORYLATING GLYCERALDEHYDE 3-PHOSPHATE DEHYDROGENASE) (GLYCERALDEHYDE -3 -PHOSPHATE 
DEHYDROGENASE [NADP+] ) (TRIOSEPHOS PHATE DEHYDROGENASE) . Edit characterized 
GP| 642667 | gb[ AAA91091.1| |L38521 NADP-dependent glyceraldehyde- 3 -phosphate dehydro Insert 
40 characterized 

ORF01688(301 - 1725 of 2025) 

EGAD | 42413 | 44796(1 - 475 of 475) NADP-dependent glyceraldehyde-3-phosphate dehydrogenase 
{Streptococcus mutans} EGAD 1 424 13 |ll0509 NADP-dependent glyceraldehyde - 3 -phosphate 

45 dehydrogenase {Streptococcus mutans}SP | Q59931 | GAPN_STRMU NADP-DEPENDENT GLYCERALDEHYDE - 3 - 

PHOSPHATE DEHYDROGENASE (EC 1.2.1.9) (NON-PHOSPHORYLATING GLYCERALDEHYDE 3-PHOSPHATE 
DEHYDROGENASE) ( GLYCERALDEHYDE - 3 - PHOSPHATE DEHYDROGENASE [NADP+] ) (TRIOSEPHOSPHATE 
DEHYDROGENASE) . GP | 642667 1 gb |AAA91091 . 1 1 |L38521 NADP-dependent glyceraldehyde-3-phosphate 
dehydro 

50 %Match =49.3 

%Identity =66.7 %Similarity =82.5 

Matches = 317 Mismatches = 83 Conservative Sub.s = 75 

195 225 255 285 315 345 375 405 

55 *GLKNLYFFIESLDIVKFLRKICQIIEINR*SDRINLLQCKRRFTLTKEYQNYvNGEWKSSVNQIEILSPIDDSSLGFVP 

=11=1=11111111 I 1=1=1 I = II II 
MTKQYKNYVNGEWKLSENEIKIYEPASGAELGSVP 



AMTREEVDHAMKAGREALPAWAALTVYERAQYLHKAADIIERDKEEIATVIAKEISKAYNASWEVTO 
11= 1111= = = = l II! 11= III I I I I 111= 1111 = 1 =l = ll = = l I : = l = llllll = = | IIIIM 
AMSTEEVDYWASAKKAQPAWRALSYIERAAYLHKVADILMRDKEKIGAILSI<EVAKGYKSAVSEVVRTAEIINYAAEEG 
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II^STSMHSGKMDASTGHKIAVIRRQFVGIV^ 

:|: III =|:: I = I I = I I = I I I = I I I I = I = I I I I I I = I I I I I I I I I 111 = 11111111=111=11=1111 

LRMEGEVLEGGSFEAASKKKIAVTOREPVGLVIAISPFOTPWIAGSKIAPALIAGOTIAFKPPTQGSISGLLLAEAFAE 
130 140 150 160 170 180 190 

915 945 975 1005 1035 1065 1095 1125 

AGLPAGVFOTITGRGSEIGDYIVEHEEWFINFTGSTPVGXRIGKIAGKRPIMLELGGKDAGVVIADADLDNAAKQIVAG 
imillllllllllllllllllll: MINIMI :| = lll|:||llllllllllll= = I I MM || 1 = 11 
AGLPAGVFNTITGRGSEIGDYIVEHQAVNFINFTGSTC-IGERIGKMAGKRPIMLELGGKDSAIVLEDADLELTAKNIIAG 
210 220 230 240 250 260 270 



1155 1185 1215 1245 1275 1305 1335 1365 

AYDYSGQRCTAIKRVLVWEVADELAEKISEWAKLSVGXPFDNATVTPVID^ 
15 |: ||||||ll=lllll=l I I I I I III I I l = = l I 1 = 1 =11 = 11 lll = = l 1= II III I I 11 = 

AFGYSGQRCTAVKRVLVMESVADELVEKIREKVIJU^TIGNPEDDADITPLIDTKSADYVEGLIISroAMKGATALTEIKRE 
290 300 310 320 330 340 350 



1395 1425 1455 1485 1515 1545 1575 1605 

20 GRLLTPGLFDHWLDMKLAWEEPFGPILPIIRVKDAEEAVAIANKXDFGLQSSVFTRDFQKAFDIANKLEVGTVHINWKT 
I 1= I III II 11=111111111=111111 111= 1=11 ==111=1=11 II =11 II =111111111111 
GNLICPlLFDKVTTDMRl^VffiEPFGPVLPIIRVrSVEEAIEISNKSEYGLQASIFraDFPRAFGIAEQLEVGTVHINNKT 
370 380 390 400 410 420 430 

25 1635 1665 1695 1725 1755 1785 1815 1845 

GRGPDNFPFLGLKGSGAGVQGIRYSIEaMTNVKSIVFDMK*T*NDSTIVS*VVL*TSFTI 1 KIKNYIIF*SGFIFVI*LS* 

II MM I II MMM MM 111 = 111 = 1 

QRGTDNFPFLGAKKSGAGIQGVKYS IEAMTTVKS WFDIK 
450 460 470 

30 

SEQ ID 8816 (GBS127) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 29 (lane 10; MW 55.9kDa). 
GBS127-His was purified as shown in Figure 200, lane 9. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
35 vaccines or diagnostics. 



Example 1454 

A DNA sequence (GBSxl540) was identified in S.agalactiae <SEQ ID 4465> which encodes the amino 

acid sequence <SEQ ID 4466>. Analysis of this protein sequence reveals the following: 

Possible site: 17 
40 >» Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -0.37 Transmembrane 427 - 443 ( 427 - 443) 

Final Results 

bacterial membrane Certainty=0. 1150 (Affirmative) < suco 

45 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAA78049 GB:AB027569 phosphoenolpyruvate-protein 
50 phosphotransferase [Streptococcus bovis] 

Identities = 534/577 (92%) , Positives = 559/577 (96%) 



Query: 1 MTEMLKGIAASDGVAVAKAYLLVQPDLSFETVTVEDTNAEEARLDVALQASQDELSVIRE 60 
MTEMLKGIAASDGVAVAKAYLLVQPDLSFETVTVEDT+AEEARLD AL+ASQDELS+ IRE 
55 Sbjct: 1 MTEMLKGIAASDGVAVAKAYLLVQPDLSFEn/TVEDTSAEEARLDAALKASQDELSIIRE 60 

Query: 61 KAVESLGEEAAAVFDAHLMVLSDPEMINQIKETIRAKQVNAETGLKEVTDMFITIFEGME 120 

KAVE+LGEEAAAVFDAHLMVL+DPEMI+QIKETIRAKQ NAE GLKEVTDMFITI FEGME 
Sbjct: 61 KAWTLGEEAAAVFDAHLMVIADPEMISQIKETIRAKCiTNAEAGLKEVTDiyiFITI FEGME 120 
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Query: 


121 


Sbjct: 


121 


Query: 


181 


Sbjct: 


181 


Query: 


241 


Sbjct: 


241 


Query: 


301 


Sbjct: 


301 


Query: 


361 


Sbjct: 


361 


Query: 


421 


Sbjct: 


421 




481 


Sbjct: 


481 




541 


Sbjct: 


541 



DNPYMQERAADIRDVAKRVLAHLLG KLPNPATI+EESIVIAHDLTPSDTAQLNKQFVKA 



FVTNIGGRTSHSAIMARTIiEIAAVLGTNDIT RV+DG ++AVNGITGEVII PT+ Q++ 



FLYMDSQDFPTEDEQYEAYKAVLEGMNGKPWTOTMDIGGDKELPY DLPKEMNPFLGFR 



ALRISISETG+AMFRTQIRALLRASVHGQLRIMFPMVALLKEFRAAKAIF+EEKANL A+ 



GVAV++ I+VGIMIEIPAAAMIADQFAKFATOFFSIGTTOLIQYTMAADRMNEQVSYLYQP 



YNPS ILRLINNVI KAAHAEGKW GMCGEMAGDQ AVPLLV MGLDEFSMSATS+LRTRSL 



MKKLDTAKM+EYANRAL+ECSTMEEV+EL KEYV+ D 
MKKLDTAKMQEYANRALTECSTMEEVLELSKEYVNVD 577 

A related DNA sequence was identified in S.pyogenes <SEQ ID 4467> which encodes the amino acid 
sequence <SEQ ID 4468>. Analysis of this protein sequence reveals the following: 

:erminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 0875 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 540/577 (93%) , Positives = 561/577 (96%) 

MTEMLKGIAASDGVAVAKAYLLVQPDLSFETVTra)TNAEEARLDVALQASQDELSVIRE 6 0 
MTEMLKGIAASDGVAVAKAYLLVQPDLS FETVTV DTWAEEARLDVALQA+QDELSVIRE 
MTEMLKGIAASDGVAVAKAYLLVQPDLS FETVTVADTNAEEARLDVALQAAQDELSVIRE 6 0 



Query: 


1 


Sbjct: 


1 


Query: 


61 


Sbjct: 


61 




121 


Sbjct: 


121 






Sbj ct: 


181 


Query: 


241 


Sbjct: 


241 



AVESLGEEAAAVFDAHLMVL+DPEMI+Q+KE7IRAKQ NAETGLKEVTDMFITIFEGME 



DNPYMQERAADIRDVAKRV3JAHLLSVKLPNPATINEESIVIA-IDLTPSDTAQLNKQFVKA 



FVTNIGGRTSHSAIMARTLEIAAVLGTNDITERVCD3QLIAVNG1TGEVIIEPTEAQISA 240 
FVTNIGGRTSHSAIMARTLEIAAVLG™DIT+RV+DG +IAVNGITGEVII+P+E Q+ A 



FK AG AYAKQKAEW+LLKDA T TADGKHFELAftNIGTPKDVEGVN+NGAEAVGLYRTE 
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Query: 301 FL-MDSQDFPTEDEQYEAYKAVLEGMKGKP^m^TMDIC-GDKKLPYFDLPKEMNPFLGFR 360 

FLYMDSQDFPTEDEQYEAYKAVLEGMNGKPVWRTMDIGGDKELPYFDLPKEMNPFLGFR 
Sbjct: 301 FLYMDSQDFPTEDEQYEAYKAVLEGMNGKPVWRTMDIGGDKELPYFDLPKEMKPFLGFR 360 

5 

Query: 361 ALRISISETGDAMFRTQIRALLRASWIGQLRIMFPWALLKEFRAAKAIFEEEKANLLAD 420 

ALRISISETGDAMFRTQ+RALLRASVHGQLRIMFPMVALLKEFRAAKA+F+EEKANLIA+ 
Sbjct: 361 ALRISISETGDAMFRTQMRALLRASVHGQLRIMFPWALLKEFRAAKAVFDEEKMLLAE 420 

10 Query: 421 GVAVAEGIEVGIM1EIPAAAMLADQFAKEVDFFSIGTNDLIQYTMAADRMNEQVSYLYQP 480 

GVAVA+ I+VGIMIEIPAAAMLADQFAKEVDFFSIGTNDLIQYTMAADRMNEQVSYLYQP 
Sbjct: 421 GVAVADDIQVGIMIEIPAAAMLADQFAKEVDFFSIGTNDLIQYTMAADRMNEQVSYLYQP 480 

Query: 481 YNPSILRLINNVIKAAHAEGKWAGMCGEMAGDQTAVPLLVGMGLDEFSMSATSVLRTRSI, 540 
15 YNPSILRLINNVIKAAHAEGKWAGMCGEMAGDQ AVPLLVGMGLDEFSMSATSVLRTRSL 

Sbjct: 481 YHPSILRLINNVIKAAHAEGKWAGMCGEMAGDQQAVPLLVGMGLDEFSMSATSVLRTRSL 540 

Query: 541 MKKLDTAKMEEYANRALSECSTMEEVIELQKEYVDFD 577 
MKKLD+AKMEEYANRAL+ECST EEV+EL KEYV D 
20 Sbjct: 541 MKKLDSAKMEEYANRALTECSTAEEVLELSKEYVSED 577 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1455 

25 A DNA sequence (GBSxl541) was identified in S.agalactiae <SEQ ID 4469> which encodes the amino 
acid sequence <SEQ ID 4470>. Analysis of this protein sequence reveals the following: 

Possible site: 30 

>>> Seems to have no N- terminal signal sequence 

30 Final Results 

bacterial cytoplasm Certainty=0 . 1421 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

35 The protein is similar to a protein from S.bovis: 

>GP:BAA78048 GB:AB027569 histidine containing protein [Streptococcus bovis] 
Identities = 86/87 (98%) , Positives = 87/87 (99%) 

Query: 1 MASKDFHIVAETGIHARPATLLVQTASKFASDITLBYKGKAVl^KSIMGVMSLGVGQGAD 60 
40 MASKDFHIVAETGIHARPATLLVQTASKFASDITLDYKGKAVNLKSIMGVMSLGVGQGAD 

Sbjct: 1 MASKDFHIVAETGIHARPATLLVQTASKFASDITLDYKGKAVNLKSIMGVMSLGVGQGAD 60 

Query: 61 VTISAEGADADDAIAAIEETMTKEGLA 87 
VT1SAEGADADDA+AAIEETMTKEGLA 
45 Sbjct: 61 VTISAEGADADDALAAIEETMTKEGLA 87 

A related DNA sequence was identified in S. pyogenes <SEQ ID 447 1> which encodes the amino acid 
sequence <SEQ ID 4472>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 

■ Final Results 

bacterial cytoplasm Certainty=0. 1421 (Affirmative) < succi 

bacterial membrane — - Certainty=0 . 0000 (Not Clear) < suco 
bacterial outside Certainty=0. 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 86/87 (S8%) , Positives = 87/87 (99%) 
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Query: 1 MASimraiVflETGIHaRPaTLLVQTASKFASDITLDYKGKa\'NLKSIMGVMSLGVGQGAD 60 

mSKDFHIVAETGIHP^PRTLLVCfTASKFASDITLDYKGKAVNLKSIMGVMSLGVGQGAD 
Sbjct: 1 MASKDFHIVAETGIHBRPATLLVQTASKFASDITLDYKGKAWLKSIMGVMSLGVGQGAD 60 

Query: 61 VTISAEGADADDAIAAIEETMTKEGLA 87 

VTISAEGADA+DAIAMEETMTKEGLA 
Sbjct: 61 VTISAEGADAEDAIAAIEETMTKEGLA 87 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1456 

A DNA sequence (GBSxl542) was identified in S.agalactiae <SEQ ID 4473> which encodes the amino 
acid sequence <SEQ ID 4474>. This protein is predicted to be glutaredoxin-like protein nrdh (b2673). 
Analysis of this protein sequence reveals the following: 

d N-terminal signal sequence 



Pinal Results 

bacterial cytoplasm — Certainty=0 . 4532 (Affirmative) < succ 

bacterial membrane — Certainty=0 . 0000 (Mot Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAA63372 GB.-X92690 glutaredoxin-like protein [Lactococcus 
lactis] 

Identities = 42/70 (60%) , Positives = 53/70 (75%) 



Query: 4 ITVFSKNNCMQCKMTKKFr.UQHGADFEEINIDEKPEK] 

+TV+SKNNCMQCKM KK+L +H F EINIDE+PE +E V +GF AAPVI + FS 
Sbjct: 2 VTVYSKNNCMQCKIVIVKKWLSEHEIAFNEINIDEQ 61 

Query: 64 GFQPSKLKEL 73 

GF+PS+L +L 
Sbjct: 62 GFRPSELAKL 71 

A related DNA sequence was identified in S.pyogenes <SEQ ID 4475> which encodes the amino acid 
sequence <SEQ ID 4476>. Analysis of this protein sequence reveals the following: 

d N-terminal signal sequence 

Final Results 

bacterial cytoplasm — Certainty=0 . 4606 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0. 0000 (Wot Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 55/71 (78%) , Positives = 68/71 (94%) 

ITV+SKNWCMQCKMTKKFL+QHG +F-S-BINIDE PEK++YVK+LGF++APVIEA N+VFS 

Sbjct: 

Query: 64 GFQPSKLKELV 74 

GFQP+KLKEL+ 
Sbjct: 73 GFQPAKLKSLI 83 
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Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1457 

A DNA sequence (GBSxl543) was identified in S.agalactiae <SEQ ID 4477> which encodes the amino 
acid sequence <SEQ ID 4478>. This protein is predicted to be ribonucleotide reductase subunit R1E (nrdE). 
Analysis of this protein sequence reveals the following: 

Possible site: 49 

»> Seems to have no N-terminal signal sequence 
Final I 



bacterial cytoplasm --- Certainty=0. 3676 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAD41035 GB:AF112535 ribonucleotide reductase alpha-chain 
[Corynebacterium glutamicum] 
Identities = 366/701 (52%), Positives = 488/701 (69%), Gaps = 19/701 (2%) 

Query: 23 NGQIPLHKDKEALTAFFKENVQPNSKAFDSITDKIAYLLKYDYLEEAFLiNKYRPEFIEEIj 82 

NG+I KD+EA +F ++V N+ F +4- +KI YL++ Y + L+KY +FI++L 
Sbjct: 22 NGKIQFEKDREAANQYFLQHVNQNTVFFHNLQEKIDYLVENKYYDPIVLDKYDFQFIKDL 81 

Query: 83 STKLFDKKFRFKSFMAAYKFYQQYALKTNDGEYYLESIEDRVLFNRLYFADGDEEIiATDIj 142 

+ + KFRF+SF+ AYK+Y Y LKT DG YLE EDRV AL ADGD LA +L 
Sbjct: 82 FKRAYGFKFRFQSFLGAYKYYTSYTLKTFDGRRYLERFEDRVCMVALTLADGDRALAENL 141 

Query: 143 ALEMISQRYQPATPSFLNAGRSRRGELVSCFLIQVTDDMNAIGRSl'NSALQLSRIGGGVG 202 

E++S R+QPATP+FLN+G+++RGE VSCFL+++ D+M +IGRSINSALQLS+ GGGV 
Sbjct: 142 VDEIMSGRFQPATPTFLNSGKAQRGEPVSCFLLRIEDNMESIGRSINSALQLSKRGGGVA 201 

Query: 203 ISLSNLREAGAPIKGFAGAASGWPVMKLFEDSFSYSNQLGQRQGAGWYLDVFHPDIIS 262 

+ LSNLREAGAPIK +SGV+PVMKL ED+FSY+NQLG RQGAG VYL+ HPDI+S 

Sbjct: 202 LLLSNLREAGAPIKKIENQSSGVIPVMKLLEDAFSYANQLGARQGAGAVYLNAHHPDILS 261 

Query: 263 FLSTKKENADEKVRVKTLSLGITVPDKFYELARNNQEt/TYLFSPYSIEREYGVPFSYIDIT 322 

FL TK+ENADEK+R4-KTLSLG+ +PD +ELA+ N +MYLFSPY +ER YG PF+ + IT 
Sbjct: 262 FLDTKRENADEKIRIKTIaSLGWIPDITFEIAKRNDDMYLFSPYDVERIYGKPFADVSIT 321 

Query: 323 EKYDELVANPNITKTKINARDLETEISKLQQESGYPYIINIDTANRTNPVDGKIIMSNLC 382 

E YDE+V + I KTKINAR ++++Q ESGYPYI+ DT N +NP++G+I SNLC 

Sbjct: 322 EHYDEMVDDDRIRKTKINARQFFQTIAEIQFESGYPYIMYEDTVNASNPIEGRITHSNLC 381 

Query: 383 SE I LQVQKPSL INDAQEYLEMGTDI S CNLGST1JVLNMMTS PDFGKS IKTMTRALTFVTDS 442 

SEILQV PS ND Y E+G DISCNLGS NV M SP+F K+I+T R LT V++ 
Sbjct: 382 SEILQVSTPSEFNDDLTYAEVGEDISCNLGSLNVAMAMDSPNFEKTIETAIRGLTAVSEQ 441 

Query: 443 SNIFAVPTIKNGNAQAHTFGLGAffiLHSYLAKNHIEYGSPESIEFTDIYFMLMNYWTLVE 502 

++I++VP+I+ GN AH GLG M LH Y + H+ YGS E+++FT+ YF + Y L 
Sbjct: 442 TSIDSVPSIRKGNEAAHAIGLGQMNLHGYFGREHMHYGSEEALDFTNAYFAAVLYQCLRA 501 

Query: 503 SNNIARERQTT WGFEKSKYADGTYETIKYVSGKFvPQSDKVKSLFA- -NHFI PEAKDWEN 560 

SN IA ER F FE SKYA G YFD + + F P+SDKVK LFA N P +DW 
Sbjct: 502 SNKIATERGERFKNFENSKYATGEYFDDFDANDFAPKSDKATKELFAKSNIHTPTVEDWAA 561 

Query: 561 LRYAVMKDGLYHQNRLAVAPNGSISYINDCSASIHPITQRIEERQEKKIGKIYYPANGLA 620 

L+ VM+ GL+++N AV P GSISYIN+ ++SIHPI +IE R+E KIG++YYPA + 
Sbjct: 562 LKADVMEHGLFNRNLQAVPPTGS I SYINNSTSS IHPIASKIE IRKEGKIGRVYYPAPHMD 621 

Query: 521 TDTIPYYTSAYDI^^KVIDVYAAATEH\^QGXiSMTLFLRSELPKELYEWKTESKQTTRD 680 

D + Y+ AY++ K+ID YA AT++VDQGLS+TLF + TTRD 
Sbjct: 622 NDNLEYFEDAYEIGYEKIIDTYAVATKY\T:Q3LSLTLFFK DTATTRD 668 
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Query: 681 LSILRNYAFNKGVKSIYYI - -RTFTDDGSEVGANQCESCVI 719 

++ + YA+ KG+K++YYI R +G+EV + C SC++ 
Sbjct: 669 INRAQIYAWRKGIKTLYYIRLRQVALEGTEV--DGCVSCML 707 

A related DNA sequence was identified in S.pyogenes <SEQ ID 4479> which encodes the amino acid 
sequence <SEQ ID 4480>. Analysis of this protein sequence reveals the following: 

Possible site: 39 

>» Seems to have no N-terminal signal sequence 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 628/719 (87%) , Positives = 682/719 (94%) 

Query: 1 MSLKNIGDVSYFRLNNEINRPVNGQI PLHKDKEALTAFFKENVQPNSKAFDS ITDKIAYL 60 

MSLK++GD+SYFRLNNEINRPVNG+IPLHKDKEAL AF ENV PN+ +F SIT+KI YL 
Sbjct: 1 MSLKDLGD I S YFRLNNE INRPVNGKI PLHKDKEALKAFS7AENVLPNTMS FTSITEKI EYL 6 0 

Query: 61 LKYDYLEEAFLNKYRPEFIEELSTKLFDKKFRFKSFMAAYKFYQQYALKTNDGEYYLESI 120 

+ DY+E AF+ KYRPEFI EL + + + FRFKSFMAAYKFYQQYALKTNDGE+YLE++ 
Sbjct: 61 ISNDYIESAFIQKYRPEFITELDSIIKSENFRFKSFMAAYKFYQQYALKTNDGEHYLENL 120 

Query: 121 EDRVLFNALYFADGDEBIiATDLALEMISQRYQPATPSFIaNAGRSRRGELVSCFLIQVTDD 180 

EDRVLFNALYFADG E+LA DLA+EMI+QRYQPATPSFLNAGRSRRC3ELVSCFLIQVTDD 
Sbjct: 121 EDRVLFNALYFADGQEDLAKDLAVEMINQRYQPATPSFLNAGRSRRGELVSCFLIQVTDD 180 

Query: 181 MNAIGRSINSALQLSRIGGGVGISLSNLREAGAPIKGFAGAASGWPVMKLFEDSFSYSN 240 

MN+IGRSINSALQLSRIGGGVGI+LSNLREAGAPIKG+AGAASGWPVMKLFEDSFSYSN 
Sbjct: 181 mSIGRSINSALQLSRIGGGVGITLSNLREAGAPIKGYAGAASGVVPVMKLFEDSFSYSN 240 

Query: 241 QLGQRQGAGVWLDVFHPDIISFLSTKKENADEKVRWTLSLGITVPDKFYEIJUaOTIQEM 300 

QLGQRQGAGVVYL+VFHPDII+FLSTKKENADEKVRVKTLSLGITVPDICFYELAR N++M 
Sbjct: 241 QLGQRQGAGVWI^FHPDIIAFLSTKKEI^EKmVKTLSLGITVPDKFYELARKNEDM 300 

Query: 301 YLFSPYSIEREYGVPFSYIDITEKYDELVANPNITKTKINARDLETEISKLQQESGYPYI 360 
YLFSPY++E+EYG+PF+Y+DIT YDELVANP ITKTKI ARDLETEISKLQQESGYPYI 
. Sbjct: 301 YLFSPYNVEKEYGIPFNYLDITKMYDELVANPKITKTKIKARDLETEISKLQQESGYPYI 360 

. Query: 361 INIDTANRTNPVDGKIIMSNLCSEILQVQKPSLINDAQEYLEMGTDISQILGSTIWLNMM 420 
INIDTAN+ NP+DGKI IMSNLCSEILQVQ PSLINDAQE++EMGTDISCNLGSTN+LNMM 
Sbjct: 361 INIDTANKANPIDGKIIMSNLCSEILQVQTPSLINDAQEFATEMGTDISCmGSTNIIiISMvl 420 

Query: 421 TSPDFGKSIKTMTRALTFVTDSSNIEAVPTIKNGNAQAHTFGIjGAMGLHSYLAKNHIEYG 480 

TSPDFG+SIKTMTRALTFVTDSS+IEAVPTIK+GN+QAHTFGLGAMGLHSYLA++HIEYG 
Sbjct: 421 TSPDFGRSIKTMTRALTFVTDSSSIEAVPTIKHGNSQAHTFGLGAMGLHSYLAQHHIEYG 480 

Query: 481 SPESIEFTDIYFMLMNYWTLVESNNIARERQTTFVGFEKSKYADGTYFDKYVSGKFVPQS 540 

SPESIEFTDIYFML+NYWTLVESNNIARERQTTFVGFE SKYA+G+YFDKYV4G FVP+S 
Sbjct: 481 SPESIEFTDIYFMLLNYWTLVESNNIARERQTTFVGFENSI<YANGSYFDKYVTGHFVPKS 540 

Query: 541 DKVKSLFAWHFIPEAKDWENLRYAVMKDGLYHQNRLAVAPNGSISYINDCSASIHPITQR 600 

D VK LF +HFIP+A DWE LR AV KDGLYHQNRLAVAPNGSISYINDCSASIHPITQR 
Sbjct: 541 DL-VKDLFECDHFIPQASDWEALRDAVQKDGLYHQNRLAVAPNGSISYINDCSASIHPITQR 600 

Query: 601 IEERQEKKIGKIYYPANGIATDTIPYYTSAYDMDMRKVIDVYAAATEHVDQGLSMTLFLR 660 

IEERQEKKIGKIYYPANGL+TDTIPYYTSAYDMDMRKVIDVYAAATEHVDQGLS+TLFLR 
Sbjct: 601 IEERQEKKIGKIYYPANGLSTDTIPYYTSAYDMDMRKVIDVYAAATEHVDQGLSLTLFLR 660 

Query: 661 SELPKELYEWKTESKQTTRDLSILRNYAFNKGVKS IYYIRTFTDDGSEVGANQCESCVI 719 

SELP ELYEWKT+SKQTTRDLSILRNYAFKKG+KSIYYIRTFTDDG EVGANQCESCVI 
Sbjct: 661 SELPMELYEWKTQSKQTTRDLS I LRNYAFNKGI KS I YYIRTFTDDGEEVGANQCESCVI 719 



Final Results 



bacterial cytoplasm Certainty=0 .4241 (Affirmative) 

bacterial membrane Certainty=0 . 0000 (Not Clear) < 

bacterial outside Certainty=0 . 0000 (Not Clear) < 
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Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1458 

A DNA sequence (GBSxl544) was identified in S.agalactiae <SEQ ID 4481> which encodes the amino 
acid sequence <SEQ ID 4482>. This protein is predicted to be ribonucleotide reductase subunit R2F (nrdB). 
Analysis of this protein sequence reveals the following: 

d N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0 .4583 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9753> which encodes amino acid sequence <SEQ ID 9754> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

.>GP:AAC14561 GB:AF050168 ribonucleotide diphosphate reductase small 
subunit [Corynebacterium ammoniagenes] 
Identities = 166/313 (53%), Positives = 215/313 (68%), Gaps = 1/313 (0%) 

Query: 10 EAINWNEIEDVIDKSTWEKLTEQFV^DTRIPLSNDLDDWRKLSAQEKDLVGKVFGGLTLL 69 

+AINWN ID D W++LT FWL +IP+SND+ W K++ QE+ +VF GLTLL 
Sbjct: 17 KAINWNVI PDEKDLEVWDRLTGNFWL PE KI PVSND I QSWNKMTPQEQLATMRVFTGLTLL 76 

Query: 70 DTMQSETGVEAIPADWTPHEEAVI^IQFMESVHAKSYSSIFSTIMTI<SEIEEIFEWTN 129 

DT+Q G ++ DV T HEE V NI EMESVHAKSYS+IF TL + +1 E F W+ 
Sbjct: 77 DTIQGTVGAISLLPDVETMHEEGVYTNIAFMESVHAKSYSNIFMTLASTPQINEAFRWSE 136 

Query: 130 NNEFLQEKARIINDIYANGNALQKKVASTYLETFLFYSGFFTPLYYLGNNKiANVAEIIK 189 

NE LQ KA+II Y + L+KKVAST LE+FLFYSGF+ P+Y KL N A+II+ 

Sbjct: 137 ENENLQRKAKIIMSYYNGDDPLKKKVASTLLESFLFYSGFYLPMYLSSPAKLTNTADIIR 196 

^ LIIRDESVHG YIGYK+Q G +L E EQE ++ + +DL+Y LYENE +YT+ +YD +GW 

Sbjct: 197 LIIRDESVHGYYIGYKYQQGVKKLSEAEQEEYKAYTFDLMYDLYENE1EYTEDIYDDLGW 256 

Query: 250 TEEVMTFLRYNANKALMNIGQDPLFPDTANDVNPIVMNGIS-TGTSNHDFFSQVGNGYLL 308 

TE+V FLRYNANKAL NLG + LFP V+P +++ +S NHDFFS G+ Y++ 

Sbjct: 257 TEDVKRFLRYNANKALNNLGYEGLFPTDETKVSPAILSSLSPNADENHDFFSGSGSSYVI 316 

Query: 309 GSVEAMHDDDYNY 321 

G E DDD+++ 
Sbjct: 317 GKAEDTTDDDWDF 329 

A related DNA sequence was identified in S.pyogenes <SEQ ID 4483> which encodes the amino acid 
sequence <SEQ ID 4484>. Analysis of this protein sequence reveals the following: 

N-terminal signal sequence 

• Final Results 

bacterial cytoplasm Certainty=0. 4583 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below. 
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Identities = 315/319 (98%) , Positives = 316/319 (98%) 

Query: 5 MTTYYEAINWNEIEDVIDKSTWEKLTEQFWLDTRIPLS^LDDWRKLSAQEKDLVGKVFG 64 

MTTYYEAINWNE1EDVIDKBTWEKLTEQFWLDTR1PLSNDLDDWRKLS QEKDLVGKVFG 
Sbjct: 1 MTTYYEAINl^IEDVIDKSTWEKLTEQFWLDTRIPLSNDLDDWRKLSLQEKDLVGKVFG 60 

Query: 65 GLTLLDTMQSETGVEAIRADVRTPHEEA^/LNNIQFMESVHAKSYSSIFSTLNTKSEIEEI 124 

GLTLLDTMQSETGVEAIRADVRTPHEEATONNIQFMESVHAKSYSSIFSTLNTK EIEEI 
Sbjct: 61 GLTLLDTMQSETGVEAIRADTOTPHEEA^/Ll^IQFKESVHAKSySSIFSTLWTKKEIEEI 120 

Query: 125 FEWTNNNEFLQEKARIINDIYMGNALQKWASTYLETFLFYSGFFTPLYYLGlMKliRNV 184 

FEWTMSDSn2FLQEKARIIITOIYANG4ALQKKVASTYLETFLF^ 
Sbjct: 121 FEWTNNNEFLQEKARIINDIYANGDALQKKVASTYLETFLFYSGFFTPLYYLGHNKLANV 180 

Query: 185 AEIIKLIIRDESVHGTYIGYKFQLGFNELPEDEQENFRDWMYDLLYQLYEMEEKYTKTLY 244 

AEIIIOiIIRDESVHGTYIGYKFQLGFNELPEDEQENFRDMMYDLLYQLYElJEEKYTKTLY 
Sbjct: 181 AEIIKLIIRDESVHGTYIGYKFQLGFNELPEDEQENFRDWITfDLLYQLYENEEKYTKTLY 240 

Query: 245 DGVGWTEEVMTFLRYNANKALMNLGQDPLFPDTANDWPIVMNGISTGTSNHDFFSQVGN 304 

DGVGWTEEVMTFLRYNANKALMNLGQDPLFPDTANDVNPIVMNGISTGTSNHDFFSQVGN 
Sbjct: 241 DGVGWTEEVMTFLRYNANKALMNLGQDPLFPDTANDVNPIVMNGISTGTSNHDFFSQVGN 3 00 

Query: 305 GYLLGSVEAMHDDDYNYGL 323 

GYLLGSVEAM DDDYNYGL 
Sbjct: 301 GYLLGSVEAMSDDDYNYGL 319 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for , 
vaccines or diagnostics. 

Example 1459 

A DNA sequence (QBSxl545) was identified in S.agalactiae <SEQ ID 4485> which encodes the amino 
acid sequence <SEQ ID 4486>. Analysis of this protein sequence reveals the following: 

Possible site: 53 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -0.27 Transmembrane 50 - 66 ( 50 - 66) 



Final Results 

bacterial membrane Certainty=0 . 1107 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

40 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens f 
vaccines or diagnostics. 

45 Example 1460 

A DNA sequence (GBSxl546) was identified in S.agalactiae <SEQ ID 4487> which encodes the amii 
acid sequence <SEQ ID 4488>. Analysis of this protein sequence reveals the following: 

Possible site: 35 

50 





have no N-terminal signal sequence 








INTEGRAL 


Likelihood =-14.38 Transmembrane 


176 - 192 


168 


201) 


INTEGRAL 


Likelihood = -4.57 Transmembrane 


25 - 41 


22 


42) 


INTEGRAL 


Likelihood = -3.88 Transmembrane 


94 - 110 


94 


112) 


INTEGRAL 


Likelihood = -1.49 Transmembrane 


70 - 86 


70 


86) 


INTEGRAL 


Likelihood = -1.01 Transmembrane 


128 - 144 


128 


144) 




Results 
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bacterial membrane Certainty=0. 6753 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 975 1> which encodes amino acid sequence <SEQ ID 9752> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

Lns [Bacillus subtilis] 
, Gaps = 4/134 (2%) 

Query: 16 MSKNNNTTCLIETAIFAALAMALSMIP DFASWFTPSFGAIPLILFALRRGTKYGLF 71 

M+++ LIE AI A A+ L ++ + S IP+ L + R G K GL 

Sbjct: 1 MNQSKQLVRLIEIAIMTAAAVILDIVSGMFLSMPQGGSVSIMM1PIFLISFRWGVKAGLT 60 

Query: 72 AGLIWGLLHFVLSICVYYLSLSQVFIEYIIAFISMGLAGVFSAKFKDALSSSSKTKALSLA 131 

GL+ GL+ + ++ Q+ ++YI+AF ++G++G F++ + A S +K K + 

Sbjct: 61 TGLLTGLVQIAIGNLFAQHPVQLLLDYIVAFAAIGISGCFASSWKAAVSKTKGKLIVSV 120 

Query: 132 LSGAILATLVRYVWHYIAGVIFWASYAPKGMSATLYSLSVNGTAGLLTLFFWISIIILV 191 

+S + +L+RY H I+G +F+ S+APKG +YSL+ NT + + I + +L 

Sbjct: 121 VSAVFIGSLLRYAAHVISGAVFFGSFAPKGTPVWIYSLTYNATYMVPSFIICAIVLCLLF 180 

Query: 192 ISYP 195 
++ P 

Sbjct: 181 MTAP 184 

A related DNA sequence was identified in S.pyogenes <SEQ ID 4489> which encodes the amino acid 

sequence <SEQ ID 4490>. Analysis of this protein sequence reveals the following: 

Possible site: 20 
>>> Seems to have a cleavable N-term signal seq. 

Likelihood = -9.34 Transmembrane 162 - 178 ( 156 - 183) 
Likelihood = -9.34 Transmembrane 110 - 126 ( 107 - 130) 
INTEGRAL Likelihood = -1.22 Transmembrane 55 - 71 ( 55 - 71) 

Final Results 

bacterial membrane Certainty=0 .4736 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:CAB15077 GB:Z99119 similar to hypothetical proteins [Bacillus subtilis] 
Identities = 55/189 (29%), Positives = 100/189 (52%), Gaps = 10/189 (5%) 

MSPNTNVKYLIEAAIFAALAMTLSFIPDFAGWF- -SPSYGAIALV IFSLRRGLKY 53 

M+ + + LIE AI A A+ L + +G F P G+++++ + S R G+K 

MNQSKQLVRLIEIAIMTAAAVILDIV- - -S3MFLSMPQGGSVSIMMIPIFLISFRWGVKA 57 



3 +F+GS+APKG YS + N T V +F+I 



Query: 




Sbjct: 


1 






Sbjct: 


58 


Query: 


114 


Sb j ct : 


118 






Sbjct: 


178 



60 An alignment of the GAS and GBS proteins is shown below. 

Identities = 116/186 (62%) , Positives = 138/186 (73%) 
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Query: 16 MSKNNNTTCLIETAIFAAIAMALSMJPDFASWFTPSFGAI PLILFALRRGTKYGLFAGLI 75 

MS N N LIE AIFAALAM LS IPDFA WF+PS+GAI L++F+LRRG KYG+ AGLI 
Sbjct: 1 MSPNTNVKYLIEAAIFAALAMTLS F !I PDFAGWFS PS YGAIALVI FSLRRGLKYGMLAGLI 60 

5 

Query: 76 WGLLHFVLSKVYYLSLSQVFIEYILAFISMGLAGVFSAKFKDALSSSSKTKALSLALSGA 135 

WGLLHFVL KVYYLS+SQVFIEYILAF SMGLAG FS L A+ LA+ + 

Sbjct: 61 WGLLHFVLGKVYYLSMSQVFIEYILAFTSMGLAGSFSDSLIKTLRRQQTFFAVFLAIMAS 120 

10 Query: 136 ILATLVRYVWHYIAGVIFWASYAPKGMSATLYSIiSVNGTAGLLTIjFFWISIIILVISYP 195 

4- LA VRY+WH+4-AG+IFW SYAPKGMSA YS SVNGTAG+LT + +P 

Sbjct: 121 LIAVTVRYLWHFLAGIIFWGSYAPKGMSAVWYSFSVNGTAGVLTFLITCLALMIALPIHP 180 

Query: 196 SFFLPK 201 
15 F PK 

Sbjct: 181 QLFDPK 186 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

20 Example 1461 

A DNA sequence (GBSxl547) was identified in S.agalactiae <SEQ ID 4491> which 
acid sequence <SEQ ID 4492>. Analysis of this protein sequence reveals the following: 



i N- terminal signal sequence 



Likelihood = 
Likelihood = 
Likelihood = 



-7.43 
-6.64 
-6.58 



Transmembrane 206 • 



Transmembrane 



Likelihood = -6.58 
Likelihood = -4.62 
Likelihood = -3.72 



107 



222 ( 199 - 223 

40 ( 19 - 42 

77 ( 51 - 78 

150 ( 132 - 154' 

242 ( 224 - 245] 

123 ( 106 - 125! 



• Final Results 

bacterial membrane Certainty=0 . 3972 (Affirmative) 

bacterial outside Certainty=0 . 0000 (Not Clear) < : 

bacterial cytoplasm --- Certainty=0 . 0000 (Not Clear) < ; 



A related GBS nucleic acid sequence <SEQ ID 9749> which encodes amino acid sequence <SEQ ID 9750> 
was also identified. 

The protein has no significant homology with any sequences in the GENPEPT database. 

40 A related DNA sequence was identified in S.pyogenes <SEQ ID 4493> which encodes the amino acid 
sequence <SEQ ID 4494>. Analysis of this protein sequence reveals the following: 



Possible site: 23 
Seems to have no N- terminal sit 
Likelihood =-10.46 
Likelihood = -7.59 
Likelihood = -7.48 
Likelihood = -7.22 
Likelihood = -3.56 
Likelihood = -1.28 



INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 



Lgnal sequence 
Transmembrane 
Transmembrane 
Transmembrane 



Transmembrane 



- 221 ( 199 - 224] 

- 66 ( 50 - 

- 32 ( 16 - 



- Final Results 

bacterial membrane - 
bacterial outside - 
bacterial cytoplasm - 



-- Certainty=0. 5182 (Affirmative) < suco 
-- Certainty=0 . 0000 (Not Clear) < suco 
-- Certainty=0. 0000 (Not Clear) < suco 



The protein has no significant homology with any sequences in the GENPEPT database. 
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An alignment of the GAS and GBS proteins is shown below. 

Identities = 82/253 (32%) , Positives = 149/253 (58%) , Gaps = 5/253 (1%) 

Query: 6 IKQSDTTFTOIIKSLLIGGFIGAILGSVGALFIIF--GQDKYLSEI--NIVQYFLWVSRI 61 

+K+ +F+R++K L+ G I+G + F+ + G+ +L+ + +++ + ++R+ 
Sbjct: 1 MKKKKNSFLRLLKMSLLSSIiAGGI IGGMVGAFLGYHGGRLDHLTFLKDDVINLI ILLNRL 60 

Query: 62 WIITALFSLIYLYQIQKYQKVFFNVDESQ-SEEIYRQINLRHSYGMTFVSISIVLSIVN 120 

W+ S ++L Q++K V+ ++E SE YRQ+N +H+Y M ++++ +LS+ N 

Sbjct: 61 VWTDLTLSFVFLTQLKKETAVYNTIEEDDISENGYRQLNKKHAYTMIjLIAVASIIiSMCN 120 

Query: 121 TLFNYKLNI FDDSVTLVI P I YDLSLLFVLLGLHI YFLKVYRNI RGI KMTVAPTLKELKNN 180 

L L L IP+ D+ LL +++ +K Y IRG + P LKELK+N 

Sbjct: 121 VLLGLTLTNDSQHAMLAIPLLDILLLLMVIPFQALM^KRYNAIRGTDVPYFPNLKELKHN 180 

Query: 181 VLQIiDEAELESNYKMCFDIVMNLSGFIFPTIYFVLFFISFVFQKVEIVAIIITTSIHIYI 240 

++ LDEAEL++ +K F+ V4+L+G I P++Y +LFF+ +VE+ AI++ I +Y+ 

Sbjct: 181 IMALDEAELQAYHKTSFESVLSLNGVIIPSLYVILFFVYLFTGQVELTAILVLVLIQLYL 240 

Query: 241 LIKSLKAARHFYR 253 

L+KS R FYR 
Sbjct: 241 LVKSATMTRQFYR 253 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1462 

A DNA sequence (GBSxl548) was identified in S.agalactiae <SEQ ID 4495> which encodes the amino 
acid sequence <SEQ ID 4496>. Analysis of this protein sequence reveals the following: 

d N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 5172 (Affirmative) < suco 

bacterial membrane --- Certainty=0 .0000 (Not Clear) < suco 

35 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
40 vaccines or diagnostics. 

Example 1463 

A DNA sequence (GBSxl549) was identified in S.agalactiae <SEQ ID 4497> which encodes the amino 
acid sequence <SEQ ID 449 8>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=o. 2059 (Affirmative) < succ 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC76650 GB:AE000440 

UDP-D-glucose: (galactosyl) lipopolysaccharide 
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glucosyltransferase [Escherichia coli K12] 
Identities = 70/256 (27%) , Positives = 121/256 (46%) , Gaps = 14/256 (5%) 

Query: 1 ITOLLFSIDDMYVDHFi<™iiYSI.VRQTKNRKI.EIYVLQKr LLKRHTELIQYTQNLEV 56 

+N+ + +D Y+D V + S+V ++ L+ Y++ ++ +L + Q 

Sbjct: 28 IJWAYGVDANTYLDGVGVS ITS I V]jNNRHINIiDFYI IADVYNDGFFQKIAKIiAEQNQLRIT 87 

Query: 57 GYHPIIVGTEVFAQAPTTDRYPDTIYY'RLLAHKFLPETLDRILYIiDADMLCLNDFSSLYD 116 

Y + T+ P T + +Y+RL A + L TLDR4LYLDAD++C DSL 

Sbjct: 88 LYR INTDKLQCLPCTQWSRAMYFRLFAFQLLGLTLDRLLYLDADVVCKGDI SQLLH 144 

Query: 117 MELGDQLYAAASHM , DGKFLDYVNKLRLKNVELESSYFKTGVLLMfflliPAIRKVvHQQTIL 176 

+ L A A+ D + 4 RL + EL YFW+GV+ ++L + L 

Sbjct: 145 LGIjNG---AVAAvvKDvEPMQSKAVSRLSDPELLGQYFNSGVVYLDLKKWADAKLTEKAL 201 

Query: 177 DYIMQNRGRLILPDQDILNGLYANLVKPIPDEIYNYDARYSLIYQLKSRNEWDLEWVINH 236 

+M PDQD++N L + +P E Y+ Y++ +LK + + + +1 
Sbjct: 202 SILMSKDNVYKYPDQDVMNVLLKGMTLFLPRE YNTIYTIKSELKDKTHQNYKKLITE 258 

Query: 237 -TVFLHFAGRDKPWKK 251 

T+ +H+ G KPW K 
Sbjct: 259 STLLIHYTGATKPWHK 274 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or 



Example 1464 

A DNA sequence (GBSxl550) was identified in S.agalactiae <SEQ ID 4499> which encodes the amino 
acid sequence <SEQ ID 4500>. Analysis of this protein sequence reveals the following: 



T-terminal signal sequence 



- Final Results 

bacterial cytoplasm - 

bacterial membrane - 

bacterial outside - 



•- Certainty=0. 1406 (Affirmative) ■ 
•- Certainty=0. 0000 (Not Clear) < 
•- Certainty=0. 0000 (Not Clear) < i 



The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1465 

A DNA sequence (GBSxl551) was identified in S.agalactiae <SEQ ID 4501> which encodes the amino 
acid sequence <SEQ ID 4502>. Analysis of this protein sequence reveals the following: 

Possible site: 54 



=■ Seems to have an uncleavable N-term signal seg 










INTEGRAL 


Likelihood =-10.72 


Transmembrane 


7 


- 23 


1 


28 


INTEGRAL 


Likelihood = -4.30 


Transmembrane 


222 


- 238 


216 


238 


INTEGRAL 


Likelihood = -3.66 


Transmembrane 




- 167 


140 


170 


INTEGRAL 


Likelihood = -3.50 


Transmembrane 


35 




34 


58 


INTEGRAL 


Likelihood = -3.35 


Transmembrane 


71 


- 87 


59 




INTEGRAL 


Likelihood = -3.29 


Transmembrane 


113 


- 129 


113 




INTEGRAL 


Likelihood = -2.81 


Transmembrane 


170 


- 186 


168 




INTEGRAL 


Likelihood = -2.71 


Transmembrane 


158 


- 214 


197 
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Final Results 

bacterial membrane Certainty=0. 5288 (Affirmative) ■ 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < s 
bacterial cytoplasm --- Certainty=0 . 0000 (Not Clear) < i 

The protein has homology with the following sequences in the GENPEPT 

>GP:BAB07774 GB:AP001520 unknown conserved protein [Bacillus halodurans] 
Identities = 84/242 (34%) , Positives = 147/242 (60%) , Gaps = 16/242 (6%) 

Query: 1 ^GLGTVIWILIIVGGEVGLFLKNFLKESLQKSLMQAMGVAVLFISISGVLEKMMLVEK 60 

MV +GTV+N I++ +GL +KN + E ++ +LMQA+G+A++ + + KM L + 

Sbjct: 1 MVLIGTVVNGAAIVIAALIGLLVKN- 1 PERVKTTLMQAIGLAIVLLGV KMGLQTE 54 

Query: 61 SHLISNHTWMMIITIALGTOLGELLSLDSYIDKFGOTLKQKTGSGNDIKFVEAFVTSTCT 120 

LI +1 +L +G V+GE+++L+ +D G +++ K G D AFVT+T 

Sbjct: 55 QFLI VICSLVIGGVIGEMINLEKRLDHLGRWIESKVGGKKDGSIATAFVTTTLI 108 

Query: 121 VCIGAMAWGSIQDGIAADHSILFAKGMLDMIIIAIMTVSLGKGALFSALPVALLQGSLT 180 

+GAMAV+G++ G+ DHS+L K +LD + + T +LG G LFSA+PV L QGS+ 
Sbjct: 109 YWGAMAVLGALDSGLRGDHSVLLTKALLDGFLAILFTSTLGIGVLFSAIPWLYQGSIA 168 

Query: 181 IVAF FMGSLLNPSSLDyiNLVGNMLIFCVGVNLLFNLNIKVINMLPAIIIAILWGS 236 

+ A ++ + L S + ++ G ++I +G+NLL +NI+V N+LP++++ + + 

Sbjct: 169 LFASQIDQYVPTALMDSFITEMSATGGVMIVAIGLNIjLNWNIRVANLLPSLVIVAVLVT 228 

Query: 237 FI 238 
F+ 

Sbjct: 229 FV 230 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1466 

A DNA sequence (GBSxl552) was identified in S.agalactiae <SEQ ID 4503> which encodes the amino 
acid sequence <SEQ ID 4504>. This protein is predicted to be alanyl-tRNA synthetase (alaS). Analysis of 
this protein sequence reveals the following: 

Possible site: 49 

>>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -4.41 Transmembrane 805 - 821 ( 804 - 822) 

: Final Results 

bacterial membrane Certainty=0. 2763 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

?GP:BAB04986 GB:AP001511 alanyl-tRNA synthetase [Bacillus halodurans] 
Identities = 482/885 (54%) , Positives = 618/885 (69%) , Gaps = 27/885 (3%) 

Query: 1 MKELSSAQIRQMWLDFWKSKGHSVEPSANLVPVNDPTLLWINSGVATLKKYFDGSVIPEN 60 

MK L+SAQ+RQM+LDF+K KGH VEPSA+LVP +DP+LLWINSGVATLKKYFDG VI PEN 
Sbjct: 1 MKYLTSAQWQMFLDFFKEKGHDVEPSASLVPHDDPSLLWINSGVATLKKYFDGRVIPEN 60 

Query: 61 PRITNAQKSIRTNDIENVGKTARHHTMFEMLGNFSIGDYFRDEAIEWGFELLTSPEWFDF 120 

PRITNAQKSIRTNDIENVGKTARHHT FEMLGNFSIGDYF++EAIEW +E LTS +W F 
Sbjct: 61 PRITNAQKBIRTNDIENVGKTARHHTFFEMLGNFSIGDYFKEEAIEWAWEFLTSEKWIGF 120 



Query: 121 PKDKLYMTYYPDDKDS YNRWI A- CGVE PSHLVP I EDNFWE IGAGPSGPDTE I FFDRGEDF 179 
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K+KL +T +P+D ++Y+ W G+ ++ +E NFW+IG GPSGP+TEIF+DRG ++ 
Sbjct: 121 DKEKLSVTVHPEDDEAYSYWKEKIGIPEERIIRLEGNFWDIGEGPSGPNTEIFYDRGPEY 180 

Query: 180 DPENIGLRLLAEDIENDRYIEIWNIVLSQFNADPAVPRSEYKELPNKNIDTGAGL 234 

DPE L ENDRY+E+WN+V SQFN +P Y LP KNIDTG GL 

Sbjct: 181 GDQPNDPE- LYPGGENDRYLEVWNLVFSQENHNPD GSYTPLPKKNIDTGMGL 231 

Query: 235 ERLAAVMQGAKTNFETDLFMPIIREVEKLSGKTYDPDGD-NMSFKVIADHIRALSFAIGD 293 

ER+ +V+Q TNFETDLFMPIIR EK+SG Y + ++SFKVIADHIR ++FAIGD 
Sbjct: 232 ERMVSVIQNVPTNFETDLFMPIIRATEKISGTEYGSHHEADVSFKVIADHIRTVTFAIGD 291 

Query: 294 GALPGKffiGRGYVLRRLLRRAVMHGRRLGINETFLYKLVPTVGQIMESYYPEVLEKRDFIE 353 

GALP NEGRGYVLRRLLRRAV + +++GI+ F+Y+LVP VG 1M +YPEV EK FI+ 
Sbjct: 292 GALPSNEGRGYVLRRLLRRAVRYAKQIGIDRPFMYELVPWGDIMVDFYPEVKEKAAFIQ 351 



Query: 414 DAGYKIDHEGFKSAMKEQQDRARAAWKGGSMGMQNETLAGIVEESRF-EYDTYSLESSL 472 

+ G ++D +GF++ M+ Q++RAR A + GSM +Q+E L I +S F Y S E+++ 
Sbjct: 412 EQGLQVDLDGFEAEMERQRERARTARQQAGSMQVQDEVLGQITVDSTFIGYKQLSTETTI 471 

1+ D + V GQ A ++ +TPFYAE GGQVAD G+I+ G V V DVQKAP 
Shjct: 472 ETIVLDKTVADYVGAGQEAKVILKETPFYAESGGQVADKGI IRGANGFAV- - VSDVQKAP 529 

Query: 532 NGQPLHTVNVL-ASLSVGTNYTLEINKERRLAVEKNHTATHLLHAALHNVIGEHATQAGS 590 

NGQ LHTV V +L V + + R + KNHTATHLLH AL +V+GEH QAGS 

Sbjct: 530 NGQHLHTVIVKEGTLQVNDQVQAIVEETERSGIVKNHTATHLLHRALKDVLGEHVNQAGS 589 

Query: 591 JMEEFLRFDFTHFEAVSNEELRHIEQEVNEQIWNDLTITTTETDVETAKEMGAMALFGE 650 

L EE LRFDF+HF V++EE IE+ VNE+IW + + + ++ AK +GAMALFGE 
Sbjct: 590 LVSEERLRFDFSHFGQVTDEEKEKIERIVNEKIWQAIKVNISTKTLDEAKAIGAMALFGE 649 



Sbjct 
Sbjct 
Sbjct: 
Sbjct: 



771 EAKGWFIASQVDVADAGALRTFADm-KQKDYSDVLVLVAAIGEKVNVlVASKTKDV--- 827 

+ +GV +A + AD LR+ D KQ-1- S V+VL A KVN+ VA TKD+ 
770 KIEGVPVLAKAISGADMDGLRSIVDKLKQEIPSWIVLGTASEGKVNI-VAGVTJCDLINK 828 

828 --HAGNMIKGLAPIVAGRGGGKPDMAKAGGSDASKIAELIAAVAE 870 

HAG ++K +A G GGG+PDMA AGG K+ + L+ V E 

829 GYHAGKLVKEVATRCGGGGGGRPDMAQAGGKQPEKLQDALSFVYE 873 



A related DNA sequence was identified in S.pyogenes <SEQ ID 4505> which encodes the amino acid 
sequence <SEQ ID 4506>. Analysis of this protein sequence reveals the following: 

Possible site: 21 

>>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -4.41 Transmembrane 805 - 821 ( 804 - 822) 

Final Results 

bacterial membrane Certainty=0. 2763 (Affirmative) < suco 

bacterial outside Certainty^=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown helow. 

Identities = 862/870 (99%) , Positives = 864/870 (99%) 



WO 02/34771 PCT/GB01/04789 
-1619- 

Query: 1 MKELSSAQIRQMWLDFWKSKGHSVEPSANLVPVNDPTLLWINSGVATLKKYFDGSVIPEN 60 

MKELSSAQIRQMWLDFWKSKGH VEPSANLVPVMDPTLLWINSGVATLKKYFDGSVI PEN 
Sbjct: 1 MKELSSAQIRQMWLDFWKSKGHCTOPSAOTjVPVNDPTLLWINSGVATLKKYFDGSVIPEN 60 

Query: 61 PRITNAQKSIRTNDIEWGKTARHHTMFEMLGNFSIGDYFRDEAIEWGFELLTSPEWFDF 120 

PRITNAQKSIRTNDIENVGKTARHHTMFEMLGNFSIGDYFRDEAIEWGFELLTSP+WFDF 
Sbjct: 61 PRITNAQKSIRTNDIENVGKTARHHTMFEMLGNFSIGDYFRDEAIEWGFELLTSPDWFDF 120 

Query: 121 PKDKLYMTYYPDDKDSYNRWIACGVEPSHLVPIEDNFWEIGAGPSGPDTEIFFDRGEDFD 180 

PKDKLYMTYYPDDKDSYNRWIACGVEPSHLVPIEDNFWEIGAGPSGPDTEIFFDRGEDFD 
Sbjct: 121 PKDKLYMTYYPDDKDS YNRWI ACGVE PS HLVP I EDNFKE I GAGPSGPDTEI FFDRGEDFD 180 

Query: 181 PENIGLRL1AEDIENDRYIEIWNIVLSQFNADPAVPRSEYKELPHKNIDTGAGLER1AAV 240 

PENIGLRLLMDIENDRYIEIWNIVLSQFNADPAVPRSEYKELPHKNIDTGAGLERLAAV 
Sbjct: 181 PENIGLRLLAEDIENDRYIEIWNIVLSQFNADPAVPRSEYKELPNKNIDTGAGLERIAAV 240 

Query: 241 MQGAKTNFETDLFMPIIREVEKLSGKTYDPDGDNMSFKVIADHIRALSFAIGDGALPGNE 300 

MQGAKTNFETDLFMPIIREVEKLSGKTYDPDGDNMSFKVIADHIRALSFAIGDGALPGNE 
Sbjct: 241 MQGAK.TNFETDLFMP1IREVEKLSGKTYDPDGDNMSFKVIADHIRALSFAIGDGALPGNE 300 

Query: 301 GRGYVLRRLLRRAVMHGRRLGIMLTFLYKLVPTVGQIMESYYPEVLEKRDFIEKIVKREE 360 

GRGYVLRRLLRRAVMHGRRLGIJJETFLYKLVPTVGQIMESYYPEVLEKRDFIEKIVKREE 
Sbjct: 301 GRGYVLRRLLRRAW1HGRRLGINETFLYKLVPTVGQ1MESYYPEVLEKRDFIEKIVKREE 360 

Query: 361 ETFARTIDAGSGHLDSLLAQLKAEGKDTLEGKDIFKLYDTYGFPVELTEEIAEDAGYKID 420 

ETFARTIDAGSGHLDSLIAQLKAEGKDTLEGKDIFKLYDTYGFPVELTEELAEDAGYKID 
Sbjct: 361 ETFARTIDAGSGHLDSLLAQLKAEGKDTLEGKDIFKLYDTYGFPVELTEELAEDAGYKID 420 

Query: 421 HEGFKSAMKEQQDRARAAWKGGSMGMQNETLAGIVEESRFEYDTYSLESSLSVIIADNE 480 

HEGFKSAMKEQQDRARAAWKGGSMGMQNETIAGIVEESRFEYDTYSLESSLSVI IADNE 
Sbjct: 421 HEGFKSAMKEQQDRARAAWKGGSMGMQNETLAGIVEESRFEYDTYSLESSLSVIIADNE 480 

query: 481 rteavsegqallvfaqtpfyaemggqvadhgvikkdkgdtvaewdvqkapngqplhtvn 540 

rteavsegqallvfaqtpfyaemggqvad g ikndkgdtvaewdvqkapngqplhtvn 
sbjct: 481 rtfavsegqallvfaqtp?yaemggqvadtgrik]\t)kgdtvaewdvqkap1igqplhtvn 540 



Query: 541 \ ----- 

VIASLSVGTISrYTLEINKERRIAVEKiraTATHI^HAALHOTIGEBIATQAGSLlIEEEFLRFD 
Sbjct: 541 VIJ^I^VGTOTYTLEINKERRIAWKiraTATHLLHAALHlWIGEHATQAGSIJIEEEFLRFD 600 

Query: 601 FTHFE^VSITOELRHIEQEVNEQIWITOLTITTTETDVETAKEMGAMAIjFGEICYGKAA/RVVQ 660 

FTHFEAVSNEELRHIEQEVNEQIWN LTITTTETDVETAKEMGAMftLFGEKYGKVVRWQ 
Sbjct: 601 FTHFEAVSNEELRHIEQEVNEQIvmALTITlTETDVETAKEMGAMALFGEKYGKWRWQ 660 

Query: 661 IGNYSVELCGGTHLNNSSEIGLFKIVKE3GIC-SGTRRIIAVTGRQAFEAYRNQEDALKEI 720 

IGNYSVELCGGTHLNNSSEIGLFKIVKEEGIGSGTRRIIAVTGRQAFEAYRNQEDALKEI 
Sbjct: 661 IGNYSVELCGGTHLNNSSEIGLFKIVKEEGIGSGTRRIIAVTGRQAFEAYRNQEDALKEI 720 

Query: 721 AATVKAPQLKDAAAKVQALSDSLRDLQKENVEI.KEKAAAA^GDVFKDIQEAKGVRFIAS 780 

AATVKAPQLKDAAAKVQALSDSLRDLQKEN ELKEKAAAAAAGDVFKD+QEAKGVRFIAS 
Sbjct: 721 AATVKAPQLKDAAAKVQALSDSLRDLQKENAELKEKAAAAAAGDVFKDVQEAICGVRFIAS 780 

Query: 781 QVDVADAGALRTFADNWKQKI)YSDVLVLVAAIGEKVNVLVASKTKD\'HAGNMIKGLAPIV 840 

QVDVADAGALRTFADNWKQKDYSD^/LVLVAAIGEICVNVLVASKTKDVHAGNMIK IAPIV 
Sbjct: 781 QVDVADAGALRTFADNWKQKDYSDVLVLVAAIGEKVWVLVASKTKDVHAGNMIKELAPIV 840 

Query: 841 AGRGGGKPDMAMAGGSDASKIAELLAAVAE 870 

AGRGGGKPDMAMAGGSDASKIAELLAAVAE 
Sbjct: 841 AGRGGGKPDMAMAGGSDASKIAELLAAVAE 870 

Based on this analysis, it was predicted that these proteins and their epitopes could he useful antigens for 
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Example 1467 

A DNA sequence (GBSxl553) was identified in S.agalactiae <SEQ ID 4507> which encodes the amino 
acid sequence <SEQ ID 4508>. Analysis of this protein sequence reveals the following: 

3 N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2974 (Affirmative) < suco 

bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 
bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9747> which encodes amino acid sequence <SEQ ID 9748> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

l] . 

. (50%) , Gaps = 2/144 (1%) 

Query: 17 IKEKMFSLGGKFTITDLTGLPCYHVEGSLFPLPKTFKVFDEEEHLISQIEKKVLSFLPKF 76 

+K+KMFS FID + VEG F L + ++ D + IE+K++S LP++ 

Sbjct: 6 MKQKMFSFKDAFHIYDRDEQETFKVEGRFFSLGDSLQMTDSSGKTLVSIEQKLMSLLPRY 65 

Query: 77 NVTL7^GNHFTIKKDFSFLKPHYTIEDLDMEVKGNFWDMDFQLLKDNQVIANISQQWFRM 136 
+++ + K +F KP + I L+ E+ G+ W +FQL V 4+S++W 



Sbjct: 56 

Query: 137 TSTYQVEVYSETYNDLTTSLVIAI 16 0 

+Y +++ E D+ I IAI 
Sbjct: 126 GDSYHLQIAYE- -EDVLICTAIAI 147 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1468 

A DNA sequence (GBSxl554) was identified in S.agalactiae <SEQ ID 4509> which encodes the amino 
acid sequence <SEQ ID 4510>. Analysis of this protein sequence reveals the following: 

Possible site: 30 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm --- Certainty=0. 3833 (Affirmative) < suco 

bacterial membrane — Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

1F17 [bacteriophage phi-105] 
Positives = 74/133 (54%) , Gaps = 5/133 (3%) 

CYTIMFEDEGKEFP 61 
++FPD G + +S EA+ A+E + +4 FE +G P 
Sbjct: 5 RYI YPALFDYDDD - - GITVTFEDLPGCITFGNSGGEALTMAKEAMALHLYGFEQDGDI I P 62 

Query: 62 KASSFKALASNLASDEDVIQAISVUTELVRERERSKJVNKTOT 121 

+A+K+ A++I R + V KT+T+P W+ ++ KE+KVN+S 

Sbjct: 63 EATPSKEIK AEESQSvVLIETWMPPFRHDMFJffiAVXKTLTIERWMDDIAKEHIWNYS 119 
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: 122 QLLQKAIREELQV 134 

QLLQ+AI+E L + 
: 120 QLLQEAIKEHLGI 132 

DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1469 

A DNA sequence (GBSxl555) was identified in S.agalactiae <SEQ ID 451 1> which encodes the amino 
acid sequence <SEQ ID 4512>. Analysis of this protein sequence reveals the following: 

Possible site: 26 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm — Certainty=0 . 1484 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAA25696 GB:AB010712 NADH oxidase/alkyl hydroperoxidase 
reductase [Streptococcus mutans] 
Identities = 383/509 (75%), Positives = 441/509 (85%) 

MVLDKEIKAQIAQYLDLLESD1VLQADLGDNDNSQKVKDFLDEIVAMSDRISLESTHLKR 60 
M LD EIK QL QYIi LLES+IVLOA L D+ NSQKVK+FL EIVAMS ISLE L R 
MALDAEIKEQLGQYLQLLESEIVLQAQLKDDANSQKVKEFLQEIVAMSPMISLHEKELPR 6 0 







Sbjct: 


1 




61 


Sbjct: 


61 




121 


Sbjct: 


121 


Query: 




Sbjct: 


181 


Query: 




Sbjct: 


241 




301 


Sbjct: 


301 


Query: 




Sbjct: 


361 




421 


Sbjct: 


421 


Query: 


481 


Sbjct: 


481 



PSF IAKKG ES V F+GLP+GHEFTSFIIALLQVSGR PKV+ DI+KRI+ +++ ++ 



ETYVSLTCHNCPDWQAFNIM+V+NPNI+HTM+EGGM++DE+++KGIMSVPTVYKD EF 



NGA L+AKTA+LALGAKWR INVPGE+EF NKGVTYCPKCDGPLF K VAVIGGGNSG+ 



+DR TNEE +DLEGVFVQIGLVPST WLKDSG+ LNE+ EI+V K G+TNIP IFAAGD 



CTD+AYKQIIISMGSGATAA4GAFDYLIR 



WO 02/34771 



-1622- 



PCT/GB01/04789 



A related DNA sequence was identified in S. pyogenes <SEQ ID 4513> which encodes the amino acid 
sequence <SEQ ID 4514>. Analysis of this protein sequence reveals the following: 

Possible site: 29 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 0654 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Mot Clear) c suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 419/510 (82%) , Positives = 472/510 (92%) 

MVLDKEIKAQLAQYLDLLESDIVLQADLGDNDNSQIOTIODFLDEIVAMSDRISLESTHLKR 60 
M L +IK QLAQYL LLE+D+VLQ LGDN+ SQKVKDF++EI AMS+RIS+E+ L R 
MALSPDIKEQ^YLTLLEADLVLQVSLGDNEQSQKVKDFVEEIAAMSERISIENITLDR 60 

QPSFGIAKKGHESRVIFSGLPMGHEFTSFILALLQVSGRAPKVDEDIIKRIKGIEKTINL 120 
QPSF +AKKGH S V+F+GLP+GHE TSFILALLQVSGRAPKVD+D+I RIK I++ ++ 



ETYVSLTCHNCPDWQA NIM+VLN I +HTM+EGGM+QDEVK+KGIMSVPTV+ D EEF 



TSGRATIEQLLEQ+ GPL EAFADKG+YDVLVIGGGPAGNSAA.IYAARKGLKTG+LAET 



FGGQV+ETVGIENMIGTLYTEGPKLMA++E HTKSYD+DI IK+QLAT IEKKE +EVTLA 



NGA+LQAKTAILALGAKWRNINVPGE+EFRNKGVTYCPHCDGPL1FEGKDVAVIGGGNSG+ 



EAALDLAG+ KHV VLEFLPELKAD+VLQ+RAAKT+N+TI+KNVATKDIVGEDHVTGLNY 



T+RD+ E+KH+DLEGVFVQIGLVP+T+WLKDSG+ L +R EI+VDK GSTNIPGIFAAGD 



CTD+AYKQIIISMGSGATAAIGAFDYLIRQ 
CTDSAYKQIIISMGSGATAAIGAFDYLIRQ 510 . 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1470 

A DNA sequence (GBSxl556) was identified in S.agalactiae <SEQ ID 4515> which encodes the amino 
acid sequence <SEQ ID 4516>. Analysis of this protein sequence reveals the following: 

Possible site: 15 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm - 
bacterial membrane - 



Query: 


1 


Sb j ct : 


1 


Query: 


61 


Sbjct: 


61 


Query: 


121 








181 


Sbjct: 


181 




241 


Sbjct: 


241 




301 


Sbjct: 


301 


Query: 


361 


Sbjct: 


361 




421 


Sbjct: 


421 


Query: 


481 


Sbjct: 


481 
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bacterial outside Certainty=0. 0000 {Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAA25695 GB: AB010712 alkyl hydroperoxidase [Streptococcus mutans] 
Identities = 167/186 (89%) , Positives = 179/186 (95%) 

MSLVGKEI IEFSAQAYHDGKFITVTNEDVKGKWAVFCFYPADFSFVCPTELGDLQEQYET 6 0 
MSLVGKE++EFSAQAYH G+F+TV NEDVKGKWAVFCFYPADFSFVCPTELGDLQEQY T 
MSLVGKEMVEFSAQAYHQGEFVTVNNEDVKGKWAVFCFYPADFSFVCPTELGDLQEQYAT 6 0 







Sbjct: 






61 


Sbjct: 


61 


Query: 




Sbjct: 


121 


Query: 


181 


Sbjct: 


181 



A related DNA sequence was identified in S.pyogenes <SEQ ID 4517> which encodes the amino acid 
e <SEQ ID 4518>. Analysis of this protein sequence reveals the following: 

io N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3022 (Affirmative) < suco 

bacterial membrane — Certainty=0 . 0000 (Hot Clear) < suco 
bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 173/186 (93%), Positives = 181/186 (97%) 

Query: 1 MSLVGKEI IEFSAQAYHDGKFITVTNEDVKGKWAVFCFYPADFSFVCPTELGDLQEQYET 60 

MSL+GKEI EFSAQAYHDGKFITVTNEDVKGKVJAVFCFYPADFSFVCPTELGDLQEQYET 
Sbjct: 1 MSLIGKEIAEFSAQAYHDGKFITVTNEDVKGKP7AVFCFYPADFSFVCPTELGDLQEQYET 60 

Query: 61 LKSLDVEVYSVSTDTHFVHKAWHDDSDWGTITYPMIGDPSHLISQGFDVLGQDGLAQRG 120 

LKSL VEVYSVSTDTHFVHKAWHDDSDWGTITYPMIGDPSHLISQ F+VLG+DGLAQRG 
Sbjct: 61 LKSLGVEVYSVSTDTHFVHKAWHDDSDWGTITYPMIGDPSHLISQAFEVLGEDGLAQRG 120 

Query: 121 TFIIDPDGVIQMMEINADGIGRDASTLIDKVRAAQYIRQHTGEVCPAKWKEGAETLTPSL 180 
TFI+DPDG+IQMMEINADGIGRDASTLIDK+ AAQY+R+H GEVCPAKWKEGAETLTPSL 
■ Sbjct: 121 TFIVDPDGIIQMMEINADGIGRDASTLIDKIHAAQYVRKHPGEVCPAKWKEGAETLTPSL 180 

Query: 181 DLVGKI 186 

DLVGKI 
Sbjct: 181 DLVGKI 186 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1471 

A DNA sequence (GBSxl557) was identified in S.agalactiae <SEQ ID 4519> which encodes the amino 
acid sequence <SEQ ID 4520>. This protein is predicted to be 30S ribosomal protein S2 (rpsB). Analysis of 
this protein sequence reveals the following: 

Possible site: 60 
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>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=Q. 4462 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAA50276 GB:X70925 30S ribosomal protein [Pediococcus 
acidilactici] 

Identities = 190/260 (73%) , Positives = 226/260 (86%) , Gaps = 4/260 (1%) 

MAVISMKQLLEAGVHFGHQTRRWNPKMAKYI FTERNG 1 HVI DLQQTVKLADQAYEFVRDA 6 0 
M+VISMKQLLEAGVHFGHQTRRWNPKM + 1 FTERNGI + + 1 DLQ+TVKL D AY FV+D 
MSVISMKQLLEAGVHFGHQTI^WNPraiKPFIFTEI^GIYIIDLQKTVKLIDNAYNFVKDV 60 



61 AANDAVILFVGTKKQAAEAVAEEAKPAGQYFINHRWLGGTLTNWGTIQKRIARLKEIKRM 120 

AAND V+LFVGTKKQA A+ EEAKRAGQ+ + +NHRWLGGTLTNW TIQKRI RLK++K+M 
61 AANDGV^FVGTKKQAQTAIEEEAKRAGQFYVNHRWIGGTLTNWNTIQKRIKRLKDLKKM 120 

121 EEEGTFELLPKKEVALLNKQPARLEKFLGGIEDMPRIPDvMYvTOPHKEQIAVKEAKKIiG 180 

EE+GTF4- LPKKEVALI1NKQ+ +LEKFLGGIEDMP IPDV++WDP KEQIA+KEA4-KL 
121 EEDGTFDRLPKKEVALLNKQKDKLEKFLGGIEDMPHIPDVLFWDPRKEQIAIKEAQKIiN 180 



Shy 
Sbjct 
Sbjct 

Query: 237 EAQADSIEEIVEWEGSNND 256 

DS+E++ + VE +N+ 
Sbjct: 241 GVSKDSLEDLKKTVEEGSNE 260 



A related DNA sequence was identified in S.pyogenes <SEQ ID 452 1> which encodes the amino acid 
sequence <SEQ ID 4522>. Analysis of this protein sequence reveals the following: 



Final Results 

bacterial cytoplasm --- Certainty=0 .4462 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 241/254 (94%) , Positives = 248/254 (96%) 

MAVISMKQLLEAGVT1FGH0TRRVWPKI4AKYIFTERNGIHVIDLQQTVKLADQAYEFVRDA 6 0 
MAVISMKQLLEAGVTIFGHOTRRVnjPKI^Kyi FTERNG IHVIDLQQTVKLADQAYEFVTyDA 
MAVISMKQLLEAGvHFGHCTRRVniPKmKY'IFTERNGIHVIDLQQTVKLADQAYEFVRDA 6 0 

AANDAVILFVGTKKQAAEAVAEEAIO^GQYFINHRWLGGTLTNWGTIQKRIARIjKEIKRM 120 
AANDAVILFVGTKKQAAEAVA+EA RAGQYFINHRVILGGTIjTNWGTIQKRIARLKEIKRM 
AATOAVILFVGTKKQAAEAVADEATRAGQYFIKKRVILGGTLTNWGTIQKRIARLKEIKRM 120 



EEEGTF+4LPKKEVALimQPARLEKFLGGIEDMPRIPDVMYVVnPHKEQIAvTCEAKKLG 



IPVvALWDimDPDDID+IIPANDDAIPAVKLIT+KliADA+IEGRQGEDADV F 



Query: 


1 


Sbjct: 


1 


Query: 




Sbjct.: 


61 




121 


Sbjct: 


121 




181 


Sbjct: 


181 




241 


Sbjct: 


241 



DSIEEIVEWEG N 
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Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1472 

A DNA sequence (GBSxl558) was identified in S.agalactiae <SEQ ID 4523> which encodes the amino 
acid sequence <SEQ ID 4524>. Analysis of this protein sequence reveals the following: 

d N- terminal signal sequence 

10 Final Results 

bacterial cytoplasm Certainty=0 . 2648 (Affirmative) < eu.cc> 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

15 The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB73435 GB:AL139077 elongation factor TS [Campylobacter jejuni] 
Identities = 1S9/358 (47%) , Positives = 226/358 (62%) , Gaps = 19/358 (5%) 

MAEITAKLVKELREKSGAGVWDAKKALvBTDGDLDKAIELLREKGMAKTAAKKADRW^AEG 6 0 
M EITA +VKELRE +GAG+MD K AL ET+GD DKA++LLREKG+ KAAKKADR+AAEG 
MTEITAAMVKELRESTGAGINM)CKNALSETNGDFDKAVQLLREKGLGKAAKKADRLAAEG 6 0 





1 
1 


Sb j ct : 




Sb j Ct : 


61 




119 


Sbjct: 


12 0 


Query: 


172 


Sbjct: 


180 




229 


Sb j ct : 


240 


Query: 


209 


Sbjct: 


300 



Y H GR+GV+ 



L +Q4 MH+AAM+P+ LSY +LD 



++ S+ QL+D ++ +AEE IK EL A+GKPEKIWD I+PGKM+ F+ DN+++D -i 



L+ Q Y+MDD KTVE + K V F+ FEVGEG+EK + DF AEVAA + 

LMGQFYVMDDKKTVEQVIAEKEKEFGGKI KIVEF I CFEVGEGLEKKTEDFAAE VAAQL 357 

A related DNA sequence was identified in S.pyogenes <SEQ ID 4525> which encodes the amino acid 
sequence <SEQ ID 4526>. Analysis of this protein sequence reveals the following: 

d N- terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0 .3942 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 307/344 (89%) , Positives = 327/344 (94%) 

Query: 1 MAEITAKLVKELREKSGAGVIC>AKKAL 60 

MAEITAKLVKELREKSGAGVMDAKKALVOT^ 
Sbjct: 33 MAEITAKLyIGSLREKSGAGVMDAKKALVFjTTODMDK^ 92 
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Query: 61 LTGVYVDGNVAAVIKVNAETDFV^^ 120 

LTGVYV GNVAAV+EVNAETDFVAKN QFV LVN TAKVIAEG+P+NN+EALAL MPSGE 
Sbjct: 93 LTGVYVHGNVAAVVEVNAETDFVAKNAQFVELV^TAKVIAEGKPAISnsro 152 

Query: 121 TLEQAFVTATATIGEKISFRRFALVEKTDEQHFGAYQHHGGRIGVITWEGGDDALAKQV 180 

TL +A+V ATATIGEKISFRRFAL+EK DEQHFGAYQHNGGRIGVI+WEGGDDALAKQV 
Sbjct: 153 TLAEAYVNATATIGEKISFRRFALIEKADEQHFGAYQHNGGRIGVISWEGGDDALAKQV 212 

Query: 181 SMHVAAMKPTVLSYTELDAQFVHDEIAQLNHKIEQDNESRAtTOJKPALPFLKYGSI<AQLT 240 

SMH+AAMKPTVLSYTELDAQF+ DELAQLNH IE DNESRAMV+KPALPFLKYGSKAQL+ 
Sbjct: 213 SMHIAAMKPTVLSYTELDAQF1KDELAQLNHAIELDNESRAMVDKPALPFLKYGSKAQLS 272 

Query: 241 DEVIAQAEEDIKAEIAAEGKPEKIWDK1VPGKMDRFMLDNTKVDQEYTLLAQVYIMDDSK 300 

D+VI AE DIKAEIAAEGKPEKIWDKI+PGKMDRFMLDNTKVDQ YTLLAQVYIMDDSK 
Sbjct: 273 DDVITAAEADIKAELAAEGKPEKIWDKIIPGKMDRFMLDNTIWDQAYTLLAQVYIMDDSK 332 

Query: 301 TVEAYLESVNAKAVAFVRFEVGEGIEKASNDFEAEVAATMAAAL 344 

TVEAYL+SVNAKA+AF RFEVGEGIEK +NDFE+EVAATMAAAL 
Sbjct: 333 TVFAYLDSVNAKAIAFARFEVGEGIEKKANDFESEVAATMAAAL 376 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1473 

A DNA sequence (GBSxl559) was identified in S. agalactia* <SEQ ID 4527> which encodes the amino 
acid sequence <SEQ ID 4528>. Analysis of this protein sequence reveals the following: 

3 N- terminal signal sequence 

Final Results 

bacterial cytoplasm — Certainty=0. 1312 (Affirmative) < suco 

bacterial membrane — Certainty=0. 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1474 

A DNA sequence (GBSxl560) was identified in S.agalactiae <SEQ ID 4529> which encodes the amino 
acid sequence <SEQ ID 4530>. Analysis of this protein sequence reveals the following: 

Possible site: 26 

>» Seems to have a cleavable N-terra signal seq. 

INTEGRAL Likelihood = -7.86 Transmembrane 128 - 144 ( 124 - 152) 
INTEGRAL Likelihood = -4.57 Transmeinbrane 35 - 51( 33 - 53) 
Likelihood = -4.04 Transmembrane 92 - 108 ( 87 - 111) 



Final Results 

bacterial membrane Certainty=0. 4142 (Affirmative) < succ 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 
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Identities = 47/137 (34%), Positives = 71/137 (51%), Gaps = 5/137 (3%) 

Query: 12 IPLVELRGAVPFAIJ^GIPLWEAIiAIGVVGNMLPvPIIFFFARICVLEWGADKPYTGKFFT 71 

+P+VELRG +P + G+ WEAL G++GN+LP+ I R + W + + + 

Sbjct: 1 MPXVELRGGI PLGWLGLS PWEALLFGI IGNLLPI VPILLLFRPISGWMLRFKWYQRLYD 60 

Query: 72 WCLKKGHSGGQKLEKVAGEKGLFIALLLFVGIPLPGTGAWTGTliAASliLDWEFKHSVIAV 131 

W + +EK I L+LF +PLP TGA++ LAA L F+ + AV 

Sbjct: 61 WLYNRTMKKSNNVEKFGA IGLILFTAVPLPTTGAYSACLAAVLFFIPFRFAFFAV 115 

Query: 132 MLGVILAGCIMGTLSII 148 

GV++AG +M SI 
Sbjct: 116 SAGWIAGIVMTLFSYI 132 



15 No corresponding DNA sequence was identified in S. pyogenes. 

A related GBS gene <SEQ ID 8817> and protein <SEQ ID 8818> were also identified. Analysis of this 
protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 0 
McG: Discrim Score: 3.98 
20 GvH: Signal Score (-7.5): -2.35 

Possible site: 26 
>>> Seems to have a cleavable N-term signal seq. 
ALOM program count: 3 value: -7.86 threshold: 0.0 

INTEGRAL Likelihood = -7.86 Transmembrane 128 - 144 ( 124 - 152) 
25 INTEGRAL Likelihood = -4.57 Transmembrane 35 - 51 ( 33 - 53) 

INTEGRAL Likelihood = -4.04 Transmembrane 92 - 108 ( 87 - 111) 
PERIPHERAL Likelihood = 12.20 109 
modified ALOM score: 2.07 

30 *** Reasoning Step: 3 

--- Certainty=0. 4142 (Affirmative) <. suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

LPXTG motif: 105-109 



The protein has homology with the following sequences in the databases: 

40 186 216 246 276 306 336 366 395 

LTIIISNF*KIRK*NLSKDSKTRMTADFSCHY*KDKIKW1 , INTIERFYLMOTIITFLISMIPLVELRGAVPFAIANGIPLW 

= |: 11111=1=1= I 
MPFSELRGAI PLALYFGFSPA 
10 20 

45 

426 456 486 516 546 576 591 621 

FJUiAIGWGNMLPVPIIFFFARKVLEWGADKPYTGKFFTOCLKKGHSGGQKLEKVAGEKGL FIALLLFVGIPLPG 

II : 1=11=1111 = = = l = = = = = =1 = 1 11= == I =11 llll 

EAYLLSVLGNILPVPFLLLFLDYLVRIATKVELLARIYR RWERVERRKGWERYGYLGLTIFVAIPLPV 

50 40 50 60 70 80 90 



651 681 711 741 771 801 831 861 

TGAWTGTLAASLLDWEFKHSVIAVMLGVILAGCIMGTLSIIGFNLF*KS*GEMTVSPF*YLPIHQFDSKIRHLT*AKCLI 
Mi IN | || = = = II =11 == II 1= 

55 TGAWTGTLLAFLLQLNRLKAFLFISAGVCIAGVWLLASIGIIRLL 
110 120 130 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 
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Example 1475 

A DNA sequence (GBSxl561) was identified in S.agalactiae <SEQ ID 453 1> which encodes the amino 
acid sequence <SEQ ID 4532>. This protein is predicted to be CtsR protein (ctsR). Analysis of this protein 
sequence reveals the following: 



Final Results 

bacterial cytoplasm Certainty=0. 3672 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB91548 GB:AJ249133 CtsR protein [Lactococcus lactis] 
Identities = 74/146 (50%) , Positives = 103/146 (69%) , Gaps = 3/146 (2%) 

Query: 4 KOTSDNIEEYIKSLLEQSGIMIKRSNLADTPQWPSQINYVIKTRFTESRGYVVESKRG 63 

KNTSD IE Y++ LLE++ + EIKR++IA+ F WPSQINYVI KTRFT S+G+ VESKRG 
Sbjct: S KNTSDIIEAYLRQLLEEAQVIEIKRADLAKQFDWPSQINYVIKTRFTASKGFDVES1<RG 64 

Query: 64 GGGVIRIAKVHFSDQHQLFGNMLSTIGERISEQVFDDLIQLLFDEEI1TEREGNLILATS 123 

GGGYI+I K +S +H+ + + +S + D++QLLFDE+ ++TEREGNL+L 
Sbjct: 65 GGGyiKIViWQYSARHEFrjTALYQKOTANLSSKAAHDIVQLLFDEIO/TjTEREGNLLLLVI 124 

Query: 124 GDDVLGEQASVIRARMLRKLLQRLDR 149 

D G + R M++ ++ RLDR 
Sbjct: 125 TD GAISPFTRGIMMKSIINRLDR 147 

A related DNA sequence was identified in S.pyogenes <SEQ ID 4533> which encodes the amino acid 
sequence <SEQ ID 4534>. Analysis of this protein sequence reveals the following: 

j N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .2514 (Affirmative) suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 117/151 (77%), Positives = 131/151 (86%) 
Query: 1 l^IKOTSDNIEEYIICSLLEQSGIAEIKRSMiADTFQWPSQIIJYVIKTRFTESRGYVvES 60 
Sbjct 

Query: 61 KRGGGGYIRIAKVHFSDQHQI.FGNMLSTIGERIS3QVFDDLIQLLFDEEIITEREGIS1LIL 120 

KRGGGGYIRIAKVHFSD+H Ii GN+++TI + ISEQVF D IQLLFDE ++TEREGN+IL 
Sbjct: 61 KRGGGGYIRIAKVHFSDKHHLIGNLMATIEDCISEQVFTDSIQLLFDEHLLTEREGNIIL 120 

Query: 121 ATSGDDVLGEQASVIRARMLRKLLQRLDRKG 151 

A + DDVLG S IRARML +LLQR+DRKG 
Sbjct: 121 AVASDDVLGTDGSTIRARMLYRLLQRIDREG 151 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 
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Example 1476 

A DNA sequence (GBSxl562) was identified in S.agalactiae <SEQ ID 4535> which encodes the amino 
acid sequence <SEQ ID 4536>. This protein is predicted to be ClpC (clpB-1). Analysis of this protein 
sequence reveals the following: 

5 Possible site: 49 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -2.34 Transmembrane 32 - 48 ( 32 - 49) 



Final Results 

10 bacterial membrane 

bacterial outside 
bacterial cytoplasm 



■-- Certainty=0. 1935 (Affirmative) < succ: 
• — Certainty=0. 0000 (Not Clear) < suco 
— Certainty=0 . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAD01783 GB:AF023422 ClpC [Lactococcus lactis] 
Identities = 401/831 (48%), Positives = 571/831 (68%), Gaps = 52/831 (S%) 

Query: 4 YSIKLQEVFRLAQFQAARYESHYLESWHLLLAMVLVHDSVAGLTFAEYE---SEVAIEEY 60 

Y+ L +F A A +Y+ +ES HLL AM S+A A S++ 1 + 

Sbjct: 8 YTPTLDRIFEKAAEYAHQYQYGTIESAHLIAAMATTSGSIAYSILAGMNVDSSDLLIDLE 67 

Query: 61 FAATIIJUjGRAPKEEITircQFLEQSPALKjaLKlAENISIWG 120 

+ ++ + + R+ L SP ++++ +A +++ AE VGTEH+L A+L + 

Sbjct: 68 DLSSHVKVKRSE LRFSPRAEEWTVASFLAVHNNAEAVGTEHLLYALLQVE 118 

Query: 121 DLLATRILELVX3FRGQDDGESVRMTOLRKALERHAGF-TKDDIKAIYELRNPKKAKSGAS 179 

D ++L+L + + +V LRK +E+ G ++ KA+ + K AK A 

Sbjct: 119 DGFGLQLLKL QK1NIVSLRKEIEKRTGLIVPENKKAVTPMSKRKMAKGVAE 169 

Query: 180 FSDMMKPPSTAGDIADFTRDLSQMAVDGEIEPVIGRDKEISRMVQVLSRKTKNNPVLVGD 239 

S+ L + DL++ A G+++P+IGR+ E+ R++ +LSR+TKNNPVLVG+ 
Sbjct: 170 NSSTPTLDSVSSDLTEAARSGKLDPMIGREAEVDRLIHILSRRTKNNPVLVGE 222 

Query: 240 ACTGKTALAYGLAQRIAWGNIPYELRDmVLELDMMSVVAGTRFRGDFEERMNQIIADIE 299 

GVGK+A+ GLAQRI NG +P L + L+M +WAGT+FRG+FE+R+ 1+ ++ 

Sbjct: 223 PGVGKSAIIEGIAQRIWGQVP1GLMNSRIMALNMATWAGTKFRGEFEDRLTAIVEEVS 282 

Query: 300 EDGHIILFIDELHTIMGSGSGIDSTLDAANILKPALARGTLRTVGATTQEEYQKHIEKDA 359 

D +I+FIDELHTI+G+G G+DS DAANILKPALARG + VGATT EYQK+IEKD 
Sbjct: 283 ADPDVI I FIDELHTI IGAGGGTOS Vl®AANILiKPALARGDFQMVGATTYHEYQKYIEKDE 342 

Query: 360 ALSRRFAKVLVEEPNLEDAYEILLGLKPAYEAFHNVTISDEAWTAVKVAHRYLTSKNLP 419 

AL RR A++ V+EP4 ++A ID GL+ +E +H V +D+A+ +AV ++ RY+TS+ DP 
Sbjct: 343 ALERRLARINVDEPSPDEAIAILQGLREKFEDYHQVKFTEQAIKSAVTLSVRYMTSRKLP 402 

Query: 420 DSAIDLLDEASATVQMMI KKNAPSLLT ETOQAI LDDDMKSA 460 

D AIDLLDEA+A V++++K ++ E+ +A++ D+K++ 

Sbjct: 403 DKAIDLLDEAAARVKILLKTKKQNVFELEKD FVKAQEELAEAVI KLDVKASRI KEKAVEK 462 

Query: 461 --SKALKASYKGKKRKPIAVTEDHIMATLSRLSGIPVEKLTQADSKKYLNLEKELHKRVI 518 

K K S K +KR+ VT+ ++A S L+G+P+ ++T+++S + +NLEKELHKRV+ 
Sbjct: 463 ISDKIYKFSIKEEKRQE- -VTDQAVIAVASTLTGVPITQMTKSESDRLINLEKELHKRW 520 

Query: 519 GQDDAVTAISRAIRRNQSGIRTGKRPIGSFMFLGPTGVGKTEIAKALAEVLFDDESALIR 578 

GQ++A++A+SRAIRR +SG+ +RP+GSFMFLGPTGVGKTELAKALA+ +F E +IR 
Sbjct: 521 GQEFAISAVSRAIRRARSGVADSPJ^PMGSFMFLGPTGVGKTELAKALADSVFGSEDNMIR 580 

Query: 579 FDMSEYMEKFAASHLNGAPPGYVGYDEGGELTEKVRNKPYSVLLFDEVEKAHPDIFNVLL 638 

DMSE+MEK + S L GAPPGYVGYDEGG+LTE+VRNKPYSV+L DEVEKAH D+FN++L 
Sbjct: 581 VDMSEFMEKHSTSRLIGAPPGYVGYDEGGQLTERVRNKPYS\rVLLDEVEKAHLDVFNIML 640 

Query: 639 QvLDDGVLTDSRGRKVDFSNTIIIMTSNLGATALRDDKTVGFGAKDISHDYTAMQKRIME 698 

Q+LDDG +TD++GRKVDF NTIIIMTSNIiGATAIjRDDKTVGFGAK+I+ DY+AMQ RI+E 
Sbjct: 641 QILDDGFVTDTKGRKVDFRNTIIIMTSNL^TALRDDKTVGFGAKNITADYSAMQSRILE 700 
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Query: 699 ELKKAYRPEFINRIDEKVVFHSLSQDNMRSWKIMVKPLILALKDKGMDLKFQPSALKHIj 758 

ELK+ YRPEF+NRIDE +VFHSL + ++VKIM K LI L ++ + +K PSA+K + 
Sbjct: 701 ELKRHYRPEFLNRIDENIVFHSLESQEIEQIVKIMSKSLIKRLAEQDIHVKLTPSAIKLI 760 

Query: 759 AEDGYDIEMGARPLRRTIQTQVEDHLSELLLANQVKEGQVIKIGVSKGKLK 809 

AE G+D E GARPLR+ +Q +VED LSE LL+ ++K G I IG S K+K 
Sbjct: 761 AEVGFDPEYGARPLRKALQKEVEDLLSEQLLSGEIKAGNHISIGASNKKIK 811 

A related DNA sequence was identified in S.pyogenes <SEQ ID 4537> which encodes the amino acid 
sequence <SEQ ID 4538>. Analysis of this protein sequence reveals the following: 

Possible site: 44 



Final Results 

bacterial membrane Certainty=0. 1702 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

RGD motif: 285-287 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 618/814 (75%), Positives = 716/814 (87%), Gaps = 1/814 (0%) 

Query: 1 MSHYSIKLQEVFRIAQFQAARYESHYLESWHLLIAMVLVHDSVAGLTFAEYESErUAIEEY 60 

M YS K+Q++FR AQFQAAR++SH LE+WH+LLAMV V +S+A + +EY+++VAIEEY 
Sbjct: 1 MIITOSTKMQDIFRQAQFQAARFDSHCLETWHVnLAIV^ 60 

Query: 61 EAATILALGFAPKEEITNYQFLEQSPALKKILKLAENISIWGAEDTO^ 120 

EAA ILA+G4 PKE+++ F QS L +L A+ IS + ++VG+EHVL A+L+N 
Sbjct: 61 EAAAIIiAMGKTPtCEQLSRvDFRPQSKTLTNLLAFAQAISQITRDQEVGSEHVLFAILLNP 120 

Query: 121 DLLATRILELVGFRGQDDGESV-RMVDLRKALERHAGFTKDDIKAIYELRNPKKAKSGAS 179 

D++A+R+LE+ G++ +D+G R+ DLRKA+ERHAG++K+ IKAI+ELR PKK K+ + 
Sbjct: 121 DIMASRLLEIAGYQIKDNGNGQPRLADLRKAIERHAGYSKEMIKAIHELRKPKKTKTQGT 180 

Query: 180 FSDmKPPSTAGDLADFTRDLSQMAVDGEIEPVIGRDKEISRIWQVLSRKTKNNPVLVGD 239 

FSDMMKPPSTAG+L+DFTRDL++MA G +E VIGRD+E4SRM+QVLSRKTKNNPVLVGD 
Sbjct: 181 FSDMMKPPSTAGELSDFTRDLTEMARQGLLESVIGRDQEVSRMIQVLSRKTKNNPVLVGD 240 

Query: 240 AGVGKTALAYGLAQRIANGNI PYELRDMRVLELDMMS WAGTRFRGDFEERMNQI IADIE 299 

AGVGKTALAYGLAQRIANG I PYEL+-i-MRVLELDKMS WAGTRFRGDFEERMNQI I DIE 
Sbjct: 241 AGVGKTALAYGLAQRIANGAI PYELKEMR VLELDMMSWAGTRFRGDFEERNNQI IDDIE 300 

Query: 300 EDGHIILFIDELHTIMGSGSGIDSTLDAANILKPALARGTLRrVGATTQEEYQKHIEKDA 359 

DG IILF+DELHTIMGSGSGIDSTLDAANILKPAL+RGTL VGATTQEEYQKHIEKDA 
Sbjct: 301 ADGQIILFVDELHTIMGSGSGIDSTIiDAANILKPALSRGTLHMVGATTQEEYQKHIEKDA 360 

Query: 360 ALSRRFAKVLVEEPNLEDAYEILLGLKPAYEAFHNVTISDEAVMTAVKVAHRYLTSKNLP 419 

ALSRRFAK+L+EEPN EDAY+ IL+GLK +YE +HNV+IS+EAV TAVK+AHRYLTSKNLP 
Sbjct: 361 ALSRRFAKILIEEPNTEDAYQILMGLKLSYETYHWSISNFAVKTAVKMAHRYLTSKNLP 420 

Query: 420 DSAIDLLDEASATVQMMIKKKAPSLLTEVDQAILDDDMKSASKALKASYKGKKRKPIAVT 479 

DSAIDLLDEASA VQ M+KK+AP LT +DQA+++ DMK S+ L KG+ RKP VT 
Sbjct: 421 DSAIDLLDFASAAVQNMVTCKSAPETLTPIDQALINGDMKKVSRLIAKEAKGQ^KPTPVT 480 

Query: 480 EDHIMATLSRLSGIPVEKLTQADSKKYLNLEKELHKRVIGQDDAVTAISRAIRRNQSGIR 539 

ED I +ATLS+LSGI P+EKLTQADSKKYLNLEKELHKRVIGQD AVTAISRAIRRNQSGIR 
Sbjct: 481 EDDIIATLSKISGIPLEKLTQADSKKYIjNLEKELHKRVIGQDAAVTAISRAIRRNQSGIR 540 
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Query: 600 WGYDEGGELTEKVRNKPYSVLLFDEVEKAHPDIFIT'/LLQVLDDGVLTDSRGRKVDFSNT 559 

WGYDEGGELT+KVRNKPYSVLLFDEVEKAHPDIFNVLLQVLDDG+LTDSRGRIWDFSNT 
Sbjct: 601 WGYDEGGELTQramHKPYSVLLFDEVEKMPDIFOTLLQ\njDDGILTDSRGRKVDFSHT 660 

Query: 660 1 1 IMTSNLGATALRDDKTVGFGAKDI SHDYTAMQKRIMEELKKAYRPEFINRIDEKWFH 719 

IIIMTSNLGATALRDDKTVGFG KDI D+ AM+KRI +EEL+K YRPEFINRIDEKWFH 
Sbjct: 661 IIIMTSNLGATALRDDKTVGFGVKDIHQDHQAMEKRILEELRKTYRPEFINRIDEKWFH 720 

Query: 720 SLSQDlmREWKI^TOPLILALKDKGMDLKFQPSALKHLREBGYDIEMGARPLRRTlQTQ 779 

SL4-QDNMR+WKIMV+PLI L +KG+ LK QP ALKHL+E GYD MGARPLRRT+QT+ 
Sbjct: 721 SLTQDNMRDWKIMVQPLITTLASKGITLKIQPLALKHLSEVGYDEHMGARPLRRTLQTE 780 

Query: 780 VEDHLSELLLANQVKEGQVIKIGVSKGKLKFDIA 813 

+ED LSEL+L+ +4- G 4KIG+S GKL F IA 
Sbjct: 781 IEDKLSELILSRELTSGHTLKIGLSHGKLTFHIA 814 

A related GBS gene <SEQ ID 8819> and protein <SEQ ID 8820> were also identified. Analysis of this 
protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 9 
McG: Discrim Score: -13.52 
GvH: Signal Score (-7.5): -2.1 

Possible site: 49 
»> Seems to have no N-terminal signal sequence 
ALOM program count: 1 value: -2.34 threshold: 0.0 

INTEGRAL Likelihood = -2.34 Transmembrane 32 - 48 ( 32 - 49) 
PERIPHERAL Likelihood = 0.95 112 
modified ALOM score : 0.97 

*** Reasoning Step: 3 



Final Results 

bacterial membrane — Certainty=0. 1935 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

47.4/69.6% over 804aa 

Listeria monocytogenes 

EGAD|l3676l| ClpC ATPase Insert characterized 

GP|l314297|gb|AAC44446.l| |U40604 ClpC ATPase Insert characterized 
ORF00207(298 - 2727 of 3045) 

EGAD|l3676l| 145854 (2 - 806 of 825) ClpC ATPase {Listeria monocytogenes} 
GP|l314297|gb|AAC44446.l| |U40604 ClpC ATPase {Listeria monocytogenes} 
%MatCh =33.6 

%Identity =47.4 %Sirailarity =69.6 

Matches = 372 Mismatches = 229 Conservative Sub.s = 174 

' 87 117 147 177 207 237 267 297 

SFF*STPIIWKYVINDWRAYQ*TSF**FDSIIIR*RDNYRT*RKFDSGDIR**RLRRASLCY*SSYAP*IITTIR*KRIP 



FMSHYSIKLQEVFRLAQFQAARYESHYLESWHLLU^MVLVHDSVAGLTFAEYESEVAIEEYEAATILALGRAPKEEITNY 

:: : |:|: hi :| I | : hi I =| : :| I I I :: h : :| :: :| 

MFGRFTQRAQKVLALSQEEAMRLNHSNLGTEHILLGLVREGEGIAA- -KALYELGISSEKVQQEVEGLIGHG-EKAVTTI 



QFLEQSPALKKILKIAENISIWGAEDVGTEHVLLAMLVTIKDLIATRII^LVGFRGQDDGESV-RMTOLRKALERHAGFT 

h I lh::|= = = = I llllhll == : =1 hi =1 : | ::: . | 

QYT- - -PRAKKVIELSMDEARKLGHTWGTEHILLGXiIREGEGV2iARVIjSNLGISLNKARQQVLQLLGGGDA- - - 



WO 02/34771 



-1632- 



PCT/GB01/04789 



804 834 8S4 894 924 954 984 1014 

KDDIKAIYELRNPKKAKSC^FSDMMKPPST^ 

:|l = I II Mb =1 = -Mill III I = : = I I I I = I I I I I I h I 

-- - TGAGRQTNTQATPTLDSLA- - -RDLTVIAREXINLDPVIGRSKEIQRVIEVLSRRTKNNPVLIG 

150 160 170 180 190 200 



1044 1074 1104 1134 1164 1194 1224 1254 

DAGVGKTAIAyGIAQRIANGNIPYELRD^!KVLELDMMSWAGTRFRGDFEER^MQIIADIEEDGHIILFIDELI^:IMGSG 
10 = I'M lllhl :| II ||: III :|||||::||:||:|: : = = =1 = I = = I I I I I I I I I = = I : I 

EPGVGKTAIAEGLAQQIVRNEVPETLRGKRWLDMGTWAGTKfRGEFEDRLKKVMDEIRQAGOTI 

220 230 240 250 260 270 280 

1284 1314 1344 1374 1404 1434 1464 1494 

1 5 SGIDSTLDAANILKPAIARGTLRTVGATTQEEYQKHIEKDAALSRRFAKM,VEEPmEDAYEILLGLKPAYEAFHKrUTIS 
I : I I I I 1= =1111 :||*l*llll II III = Ml -I" =11 11= 111 II h 

-GAEGAIDASNILKPPLARGELQCIGATTLDEYRKYIEKDRALERRFQPIKVDEPTVEESIQILHGLRDRYEAHHRVAIT 
300 310 320 330 340 350 360 

20 1524 1554 1584 1614 1644 1674 1704 

DEAVMTAVKVAHRYLTSKNLPDSAIDLLDEASATVQM MIKKNAPSLLTEVDQAILDDDMKSASKALKASY 

|||: ||::: || :: : ||| |||::|| : : |:: :: | | | | |: , : : |: 

DEALEAAVRLSDRYISDRFLPDKAIDVIDESGSTORLKSFTTPKWKEMENHLSDLKKEKDAAVOJ3QEFEKAASLRDKEQ 
380 390 400 410 420 430 440 

25 

1725 1737 1767 1797 1827 1857 1887 

KGKK- - -RKPIA VTEDHIMATLSRLSGIPVEKLTQADSKKYLNLEKELHKRVIGQDDAVTAISR 

III =1 : 1111 : - :|lll II = - I 11=11 11=111111 II I = I 

KLKKSLDKKSLEETKAMQEKQGLDHSEVTEDIVAEWASWTG^ 
30 460 470 480 490 500 510 520 

1917 1947 1977 2007 2037 2067 2097 2127 

AIRRNQSGIRTGKRPIGSFMFLGPTGVGKTEIAKAIAEVLFDDESALIRFDMSEYI-ISKFAASHLNGAPPGYVGYDEGGEL 
hll ::! = = I I ! I I I I = I I I 1 I I I I I II I I = I I I I =1 II -II lllllllll: = I I I I 1 I I I I I = I 1 I = I 
35 AVRRARAGLKDPKRPIGSFIFLGPTGVGKTEIjARAIAESMFGDEDSMIRIDMSEYI^EKFSTARLVGAPPGYVGYEEGGQL 
540 550 560 570 580 590 600 

2157 2187 2217 2247 2277 2307 2337 2367 

TEKVRNKPYSVLLFDEVEKAHPDIFNVLLQTODDGVLTDSRGRKVDFSNTIIIMTSNLGATALRDDKTVGFGAKDISHDY 
40. || l||||:|:||:||||l|:||:|||||lll lllhl III Ihlllllhll ■ ll-ll I 1 = 

TEKVRQKPYSVVLLDEIEKAHPDVFMvILLQVLDDGRLTDSKGRWDFROTVIIMTSNIGAQEMKQDKBMGFIWTDPLKDH 
620 630 640 650 660 670 680 

2397 2427 2457 2487 2517 2547 2577 2607 

45 TAMQKRMELKKAYRPEFINRIDEKWFHSLSQDNMREVVO 

II: |::::||:|:|||||imi -Will : : : I : : I I ' ' ■ \ ■ I = I I I II I II 

KAMEHRVLQDLKQAFRPEFINRIDETIVFHSLQEKELKQIVTLLTAQLTKRLAERDIHVKLTEGAKSKIAKDGYDPEYGA 
700 710 720 730 740 750 760 

50 2637 2667 2697 2727 2757 2787 2817 2847 

RPLRRTIOTQVEDHLSELLLANQVKEGQVIKIGVSKGKLKFDIAKS*KIPVPMGTGILI*KENVQNILDIFL*IYEK*KD 
111:1 II :||| III II =| I -III III: 

RPLKRAIQKEVEDMLSEELLRGNIKA7GDYT7EIGVKEGKLETOKKDAPKKKT7SKKVKAK 
780 790 800 810 820 

55 

There is also homology to SEQ ID 258. 

SEQ ID 8820 (GBS26) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 7 (lane 9; MW 93.3kDa), in Figure 167 (lane 16 & 17; MW 108kDa) and in 
Figure 239 (lane 14; MW 108kDa). It was also expressed in E.coli as a GST-fusion product. SDS-PAGE 
60 analysis of total cell extract is shown in Figure 15 (lane 7; MW 1 18kDa). 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 
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Example 1477 

A DNA sequence (GBSxl563) was identified in S.agalactiae <SEQ ID 4539> which encodes the amino 
acid sequence <SEQ ID 4540>. Analysis of this protein sequence reveals the following: 

i cleavable N-term signal seq. 

Pinal Results 

bacterial outside Certainty=0 .3000 (Affirmative) < suco 

bacterial membrane -— Certainty=0 .0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related DNA sequence was identified in S. pyogenes <SEQ ID 454 1> which encodes the amino acid 
sequence <SEQ ID 4542>. Analysis of this protein sequence reveals the following: 

Possible site: 17 
>>> Seems to have a cleavable N-term signal seq. 
Final Results 



bacterial outside --- Certainty=0 .3000 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) .< suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 178/213 (83%) , Positives = 199/213 (92%) 

Query: 1 MLIVIAGTIGAGKSSLAAALGQHLGTDWYEAVDNNPVLDLYYQDPQKYAFLLQ1FFLNK 60 

MLI VLAGTIGAGKS SLAAALG+HLGTDVFYEAVDNNP VLDLYYQDP+KYAFLLQI + FLNK 
Sbjct: 1 MLIVIAGTIGAGKSSIAAALGEHLGTDVFYEAVDNNPVLDLYYQDPKKYAFLLQIYFLNK 60 

Query: 61 RFQS I KEAYKANNNVLDRS I FEDELFLTLNYKNGNVTKTELD I YKELLANMLEELEGMPK 120 

RF+ S I KEAY+A+NN+LDRS I FEDELFL LNYKNGNVTKTELDIY+ELLftNMLEELEGMPK 
Sbjct: 61 RFKS I KEAYQADNNILDRS I FEDELFLKLNYKNGNVTKTELD I YQELLANMLEELEGMPK 120 

KRPDLI1+YIDVSFDKMLERI++RGRSFEQVD NP L YY^VH EYP WYE+Y+VSPK+ 
Sbjct: 121 KRPDLLIYIDVSFDKMLERIERRGRSFEQVDGNPSLEQYYHQVHGEYPTWYEDYEVSPKM 180 

Query: 181 RIDGNKLDFVKNPEDLQHVLDTIDSELQKLDLL 213 

+IDGN LDFV+NP+DL VL ID++L++L LL 
Sbjct: 181 KIDGNSLDFVQNPQDLATVLKMIDTKLKELHLL 213 

A related GBS gene <SEQ ID 882 1> and protein <SEQ ID 8822> were also identified. Analysis of this 
protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 0 
McG: Discrim Score: 3.94 
GvH: Signal Score (-7.5): 1.42 

Possible site: 17 
»> Seems to have a cleavable N-term signal seq. 
AL0M program count: 0 value: 7.69 threshold: 0.0 
PERIPHERAL Likelihood =7.69 49 
modified AL0M score : -2.04 

*** Reasoning Step: 3 

Final Results 

bacterial outside Certainty=0. 3000 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm — Certainty=0 . 0000 (Not Clear) < suco 
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SEQ ID 4540 (GBS9) was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 1 (lane 5; MW 52kDa) and Figure 12 (lane 2 & 3; MW 50.3kDa). It was also 
expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell extract is shown in Figure 2 
(lane 6; MW 27kDa) and Figure 3 (lane 2; MW 25kDa). The GBS9-GST fusion product was purified 
(Figure 191, lane 6) and used to immunise mice. The resulting antiserum was used for FACS (Figure 318), 
which confirmed that the protein is immunoaccessible on GBS bacteria. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1478 

A DNA sequence (GBSxl564) was identified in S.agalactiae <SEQ ID 4543> which encodes the amino 
acid sequence <SEQ ID 4544>. Analysis of this protein sequence reveals the following: 

3 N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 1182 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

A related DNA sequence was identified in S.pyogenes <SEQ ID 4545> which encodes the amino acid 
sequence <SEQ ID 4546>. Analysis of this protein sequence reveals the following: 

N-terminal signal sequence 

Final Results 

bacterial membrane — Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 .0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 281/323 (86%) , Positives = 305/323 (93%) 

QMSSFMIGKVEIPHRTVLRPMAGITNSAFRTIAKEFGAGLWMEMISEKGLLYNlffiKTL 62 
+LNSSF IG VEIPHRTVIiAPiyiAG+TNSftFRTIAKEFGAGLVvMENISEKGLLYNNEKTL 





3 


Sbjct: 


27 


Query: 


63 


Sbjct: 


87 






Sbjct: 


147 




183 


Sbjct: 


207 


Query: 


243 


Sbjct: 


267 


Query: 


303 


Sbjct: 


327 



HMLHIDENEHPMSIQLFGGDAEGLKRaADFIQ+M'KADIVDINMGCPWKVVTCNEAGAKW 



MYTGTCDHETL +V+KA+T IPFI NGD+R+V DAKFMIEEIG DA+M+GR A +NPY+F 



TQINHFFETG+ LPDLPF K LD+A+DHL RL+NLKGETIAVREFRGLAPHYLRG +GAA 



K+RGAVSRAETLAEV+ +F 
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Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 



Example 1479 

A DNA sequence (GBSxl565) was identified in S.agalactiae <SEQ ID 4547> which encodes the amino 
acid sequence <SEQ ID 4548>. Analysis of this protein sequence reveals the following: 

N-terminal signal sequence 

Final Results 

bacterial cytoplasm --- Certainty=0. 2164 (Affirmative) < suco 

bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside — Certainty=0. 0000 (Not Clear) < suco 

There is also homology to SEQ ID 3930: 

Identities = 235/288 (81%) , Positives = 259/288 (89%) 

Query: 1 MDKIIKSISTSGSFRAYVLDCTETTOTAQEKHQTLSSSTVALC3RTLIANQILAANQKGNS 60 

MDKIIKSI+ SG+FRAYVLD TETV AQEKH T1SSSTVALGRTLIANQILAANQKG+S 
Sbjct: 1 MDKIIKSIAQSGAFRRYVIjDSTETVAI^QEKHNTLSSSTvALGRTLIANQILAANQKGDS SO 

Query: 61 KVTVWIGDSSFGHIISVRDTKGWKGYIQffl'GVDIKKTATGEVLVGPFMGNGHFVVITD 120 

K+TvWIGDSSFGHIISVADTKG+VKGYIQNTGVDIKKTATGEVLVGPFMGNGHFV I D 
Sbjct: 61 KITVTWIGDSSFGHIISVADTKGHVKGYIQNTGVDIKKTATGEVLVGPFMGNGHFVTIID 120 

Query: 121 YATGQPYTSTTPLITGEIGEDFAYYLTESEQTPSAVGLNVLLDDEDKVKVAGGFMLQVLP 180 

Y TG PYTSTTPLITGEIGEDFAYYLTESEQTPSA+GLNVLLD+ DKVKVAGGFM+QVLP 
Sbjct: 121 YGTGNPYTSTTPLITGEIGEDFAYVLTESEQTPSAIGLNVTjLDEMDKvTCVAGGFIWQVLP 180 

Query: 181 GASDEEISRYEKRIQEMPSISSLLESENHIESLLSAIYGEDDYKRLSEDSLAFYCDCSKE 240 

GAS+EEI+RYEKR+QEMP+IS LL S+NH+++LL AIYG++ YKRLSE+ L+F CDCS4E 
Sbjct: 181 GASEEEIARYEKRLQEMPAISHLIASKNHVDALLEAIYGDEPYKRLSEEPLSFQCDCSRE 240 

Query: 241 RFEAALLTLGTKELQAMKDEDKGVEITCQB'CNQTYYFTEEDLEKIIND 288 

RFEAAL+TL +LQAM DEDKG EI CQFC Y F E DLE II+D 
Sbjct: 241 RFEAALMTLPKADLQAMIDEDKGAEIVCQFCGTKYQFNESDLEAIISD 288 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 



Example 1480 

40 A DNA sequence (GBSxl566) was identified in S.agalactiae <SEQ ID 4549> which encodes the amino 
acid sequence <SEQ ID 4550>. This protein is predicted to be surface-located membrane protein 1 (lmpl). 
Analysis of this protein sequence reveals the following: 

possible site: 51 

»> Seems to have no N-terminal signal sequence 

45 

Final Results 

bacterial cytoplasm Certainty=0 .4312 (Affirmative) < suco 

bacterial membrane — Certainty=0. 0000 (Not Clear) < suco 
bacterial outside Certainty=0. 0000 (Not Clear) < suco 

50 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAB93480 GB:AF019377 tellurite resistance protein [Rhodobacter 
sphaeroides] 

Identities = 64/350 (18%) , Positives = 146/350 (41%) , Gaps = 7/350 (2%) 

55 
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Query: 44 LTPAQKSAISEKTPALVDTFVGDQNALLDEV^EAVEGVNTTVNHILSEQKKIQIPQVDDL 103 

LA E ++VD+++FGA +T +L++ K + D 

Sbjct: 34 LASAPPEKAQEIPJiFJYIAEIJWSDSQSIIGFGSKAQAELQTISQQ 93 

Query: 104 LKNANRELNGFIAK^KDATPAELEKKPNLIQKLFKQSKTSLQEFYFDSQNIEQKMDMMAA 163 

L+ + GF + ++ +K + ++L ++ F ++++Q++D + 

Sbjct: 94 LREWSTIRGF SVSEFDVRRKASWWERLLGRT-APFARFVARYEDVQQQIDRITQ 147 

Query: 164 KWKQEDTLARNI VSAEML I EDNTKS IEMLVGVIAFIESSQAEAANRASHLQQE I LALDS 223 

+++ E L ++I ++L + L IA + A+ R ++ +A 

Sbjct: 148 SLLTHEHRLLKDIKGLDILYARTLDFYDELALYIAAGDEVLADLDGRVIPAKEAEVAATP 207 

Query: 224 QTSEYQIKSNQIARMTEVINTLEQQHPEWSRLWAWATTPQMRNLVKVSSDMRQKLGML 283 

+ + IK+ +L + + LE++ + V +P+R + + +++ 

Sbjct: 208 E -GDRMI KAQELRDLRAARDDLERRVHDLKLTRQVTMQSLPS IRLVQENDKALVTRINST 266 

Query: 284 RRKTI PTMKLS IAQLGMMQQSVKSGVTADAIVKANNAM-QMIiAETSKEAI PMLEKTAQSP 343 

NT+P + +AQ +Q+S ++ + N L AE ++A ++ K + 

Sbjct: 267 LVim/PLl^TQLAQAOTIQRSREAAEAVRGASDLTNELLTANAENLQQANKIVRKEMERG 326 

Query: 344 TVSIKSVTALAESLVAQNNGIIAAIDKGRKERAQLESAVIKSAETINDSV 393 

1++V +L+A N +A D+GR RA E+ + + + D++ 

Sbjct: 327 VFDIEAVKKANATLIATINESLAIADEGRARRATAETELQRMEAELRDTL 376 

A related DNA sequence was identified in S.pyogenes <SEQ ID 455 1> which encodes the amino acid 
sequence <SEQ ID 4552>. Analysis of this protein sequence reveals the following: 

3 N--errainal signal sequence 

Final Results 

bacterial cytoplasm — Certainty=0. 3230 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 333/413 (80%), Positives = 379/413 (91%) 

Query: 5 FNFDIDQIADNAITKTDKTTEIISNQTTSQTGQIAFFEKLTPAQKSAISEKTPALVDTFV 64 

FNFDIDQIADNA+ KTDKTT+IIS+ T GQI+FFEKL+ Q++AI+ K PALVDTF+ 
Sbjct: 4 FNFDIDQIADNAVIKTDKTTDIISDLPTDTNGQISFFEKLSADQQTAITAKAPALvDTFL 63 

Query: 65 GDQNALLDFGQSAVEG\m , TViraiLSEQKKIQIPQVDDLLKNATmEI^GFIAKYKDATPA 124 

DQNALLDFGQSAVEGVN TVNHIL+EQKK+QI PQVDDLLK+ NRELNGFIAKYKDATP 
Sbjct: 64 ADQNALLDFGQSAVEGWATVNHILAEQKKLQIPQVDDLLKSTNRELNGFIAKYKDATPV 123 

Query: 125 ELEKKPNLIQKLFKQSKTSLQEFYFDSQNIEQKtffiM-lAANWKQEDTLARNIVSAEMLIE 184 

+L+KKPN +QKLFKQS+ +LQEFYFDSQNIEQKMD MAA WKQEDTLARNIVSAE+LIE 
Sbjct: 124 DLDKKPNFLQKLFKQSRDTLQEFYFDSQNIEQKMDSI>1AAAWKQEDTLARNIVSAELLIE 183 

Query: 185 DNTKSIENLVGVIAFIESSQAEAANRASHLQQEIIALDSQTSEYQIKSNQLARMTEVINT 244 

DNTKSIE+LVGVIAF1E+SQ EA+ RA+ LQ+++ DS T +YQIK++ LAR TEVINT 
Sbjct: 184 DNTKSIEHLVGVIAFIEASQKEASQRARALQKDLKTKDSATPDYQIKRDLLARTTEV1NT 243 

Query: 245 LEQQHPEWSRLWAWATTPQ^IRNLVICVSSDMRQKLGMLRRNTIPTMKLSIAQLGMMQQS 304 

LEQQH EY+SRLWAWATTPQ^mNLVKVSSD^lRQKLG^IIJRRNTIPTMICLSIAQLGMMQQS 
Sbjct: 244 LEQQHTEYLSRLWAWATTPQ^re]S^^VKVSSD^QK]lGMLRRNTIPTMKLSIAQLGMMQQS 3 03 



Query: 365 IAAIDKGRKERAQLESAVIKSAETINDSVKZRDKKIVEALLNEGKSTQEKVDE 417 

IAAID GRKERAQLESA+ 1 +SAETINTJSVK+RD+ IV+ALL+EGK TQ+ +D+ 
Sbjct: 364 IAAIDHGRKERAQLESAIIRSAETIJTOSVKLRDQNIVQALLSEGKETQKTIDK 416 
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SEQ ID 4550 (GBS201) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 49 (lane 5; MW 49kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 54 (lane 3; MW 74.5kDa) and in Figure 
62 (lane 8 & 9; MW 74.5kDa). The GBS201-GST fusion product was purified (Figure 209, lane 9) and 
used to immunise mice. The resulting antiserum was used for FACS (Figure 304), which confirmed that the 
protein is immunoaccessible on GBS bacteria. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1481 

A DNA sequence (GBSxl567) was identified in S.agalactiae <SEQ ID 4553> which encodes the amino 
acid sequence <SEQ ID 4554>. This protein is predicted to be rhoptry protein. Analysis of this protein 
sequence reveals the following: 

Possible site: 27 

»> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood = -6.58 Transmembrane 13 - 29 ( 10 - 31) 
INTEGRAL Likelihood = -1.54 Transmembrane 33 - 49 ( 33 - 49) 

Final Results — -- 

bacterial membrane Certainty=0.3S33 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm — - Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

A related DNA sequence was identified in S.pyogenes <SEQ ID 4555> which encodes the amino acid 
sequence <SEQ ID 4556>. Analysis of this protein sequence reveals the following: 

Possible site: 37 

>:» Seems to have a cleavable N-term signal seq. 

Final Results 

bacterial outside Certainty=0. 3000 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
An alignment of the GAS and GBS proteins is shown below. 

Identities = 115/239 (4B%) , Positives = 162/239 (67%) , Gaps = 3/239 (1%) 

Query: 32 EVIATLLIIGGGYCAYYVYD-KKRLKRFTSNQRIEALKSDIKETDQDIRHLEILKKDNRS 90 

+++ + I G GY + V +KRL + +++E LK+ 1+ D+ +R L+ D+ 
Sbjct: 42 DILPAIAIGGTGYAIFRVRSHQKRLAKAKIAKQLEDLKAKIQLADRKVRLLDTYLADHDD 101 

Query: 91 KEYIKLAHQILPQLDLIRNEANQLQKAIEPNIYKRITICKANTFSNEINEQIjIKLHASPEL 150 

+Y LA Q+LPQL 1+ +A L+ ++P IY+RITKKAN ++I QL L + L 
Sbjct: 102 FQYNVIAQQLLPQLSDIKAKAITLKDQLDPQIYRRITKKANDVESDITLQLETLQIATTL 161 

Query: 151 --EPISDQEDEMIRIAPELKPFYHNIQDDHFAILKKIEEADNKAELAAIHQANMKRFTDV 208 

+P+ +1 APELKP+Y NIQ DH AIL KI+ ADN+ EL A+H ANM+RF D+ 

Sbjct: 162 NPQPLKTPSPNLINKAPELKPYYDNIQTDHQAILAKIQGADNQEELLALHDANMRRFEDI 221 

Query: 209 LAGYIRIKQSPKNFNNAKERLEQALQAIKKFNLDLDETLRQIjNESDMKDFDVSLRMMQG 267 

L GY++IK+ PKN+ NA RLEQA QAI++F+ DLDETLR+LNESD+KDFD+SLR+MQG 
Sbjct: 222 LTGYLKIKEEPKNYYNAAARLEQAKQAIQQFDEDLDETLRRLNESDLKDFDISLRIMQG 280 
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SEQ ID 4554 (GBS265) was expressed in E.coli as GST-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 54 (lane 2; MW 56kDa) and in Figure 62 (lane 6; MW 56.3kDa). 

The GBS265-GST fusion product was purified (Figure 207, lane 5) and used to immunise mice. The 
resulting antiserum was used for Western blot (Figure 258A) and FACS (Figure 25 8B). These tests confirm 
that the protein is immunoaccessible on GBS bacteria. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 



Example 1482 

A DNA sequence (GBSxl568) was identified in S.agalactiae <SEQ ID 4557> which encodes the amino 
acid sequence <SEQ ID 4558>. This protein is predicted to be glutamate-cysteine ligase (gshA). Analysis 
of this protein sequence reveals the following: 

Possible site: 40 

■»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -1.70 Transmembrane 575 - 591 ( 575 - 591) 

Final Results 

bacterial membrane Certainty=0 . 16B0 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) <; suco 

bacterial cytoplasm --- Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAG08588 GB:AE004933 glutamate — cysteine ligase [Pseudomonas aeruginosa] 
Identities = 142/468 (30%), Positives = 220/468 (46%), Gaps = 62/468 (13%) 

■QATFGLERESLRIHQPTQRVAQTPHPKTLGSRNYHPYIQTDYSEPQLELITPI 70 
+ G+ERE LR+ ++A TPHP+ LGS HP I TDYSE LE ITP 



++ EYLW SMP ++ EE I IA+ 



Query: 


12 


Sbjct: 


16 


Query: 


71 


Sbjct: 


75 


Query: 


126 


Sbjct: 


134 


Query. 


185 


Sbjct: 


194 




231 


Sbjct: 


254 




265 


Sbjct: 


314 


Query: 


315 


Sbjct: 


374 


Query: 


374 


Sbjct: 


434 



C +Q I+GIHYN L + L I, + +4 + D+Q+ Y+ L +NF E 



WLL YL+GASP 



Y L+ Y++ L AV 



D+NPF P+GI 



f Q+L+++ G S E F KQ H 
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There is also homology to SEQ ID 4560. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1483 

A DNA sequence (GBSxl569) was identified in S.agalactiae <SEQ ID 4561> which encodes the amino 
acid sequence <SEQ ID 4562>. Analysis of this protein sequence reveals the following: 

Possible site: 59 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 1504 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB73814 GB:AL139078 helix-turn-helix containing protein 
[Campylobacter jejuni] 
Identities = 107/223 (47%) , Positives = 148/223 (65%) , Gaps = 7/223 (3%) 

MDKEKLDYWKTIITFLHNVL6DNYEIVLHWDENDIYIGELVNSHISGRTISSPLTTPAL 60 
MD+ + + + FL VLG+- YEIV HV+ E+ YI + NSHISGR++ SPLT FA 
MDEGQKQQFIKLTYFLGEVLGEQYEIVFHVITEDGAYIAAIANSHISGRSLDSPLTAFAS 60 

DLIKNKVYKEKDFTOmCAIVSPMKEVRGSTFFIKNAQNELEGMLCINLDI SAYQNIAL 120 
+L++NK Y EKDF+ +YKA+V +K +RGSTFFIKN ++L G+LCIN D S +++ 



Sbjct: 


1 




61 


Sbjct: 


61 


Query: 


121 


Sbjct: 


119 




180 


Sbjct: 


175 



+E LS +I+DI+ tVDSHi 



K El KL+EKG+F +KGAV VA+ L ISEPSVYRYLKK 4 



A related DNA sequence was identified in S.pyogenes <SEQ ID 4563> which encodes the amino acid 
sequence <SEQ ID 4564>. Analysis of this protein sequence reveals the following: 

^ N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1636 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 169/224 (75%) , Positives = 198/224 (87%) , Gaps = 3/224 (1%) 

Query: 1 MDKEKLDYWKTIITFLHNVLGDNYEIVLHVVDENDIYIGELVNSHISGRTISSPLTTFAL 60 

MDKE L+ YWKT+ ITFLH+ VLGDNYE I +LHV+D+NDI YIGEL VNSHISGR+ SPLTTFAL 
Sbjct: 1 MDKETLNYWKTVITFLHDVLGDNYEIILHVIDKNDIYIGELVNSHISGRSKQSPLTTFAL 60 

Query: 61 DLIKNKVYKEKDFVTNYKAIVSP™KEWGSTFFIKNaQNEIJEGMLCI^rLDISAYQNIAL 120 

DLI NKVYKEKDFVTNYKAIVSP +KEVRGSTFFIK+ + LEGMLCINLDI SAYQ +A 
Sbjct: 61 DLITNKVYKEKDFVTNYKAIVSPQKKEVRGSTFFIKEKKGNLEGMLCINLDISAYCjGV 120 



Query: 121 DILDLVNLNVNKILP--KSPQKISLPQQEEPVEVLSGKIQDIISEIVDPSLLNQNIHLSQ 178 
D+L LVNHJ+ +P K P+ ++ PQ EE VE+L+ NIQDII +H-DPSLL N+HLSQ 
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Sbjct: 121 DLLKLVNIWLEHFIPTAKEPKTVT-PQPE3AVEILTSNIQDIIGQIIDPSLLRHNVHLSQ 179 

Query: 179 EVKVEIVSKLHEKGVFQLKGRVSKVAEVIJIISEPSVYRYLKKIE 222 

+VK++IV+KL+EKGVFQLKGAVSKVA++L ISEPSVYRYLKKIE 
Sbjct: 180 DVKIDIVAKLYEKGVFQLKGAVSKVADILCISEPSOTRYLKXIE 223 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or d 



Example 1484 

10 A DNA sequence (GBSxl570) was identified in S.agalactiae <SEQ ID 4565> which encodes the amino 
acid sequence <SEQ ID 4566>. This protein is predicted to be regulatory protein pfoR. Analysis of this 
protein sequence reveals the following: 



Possible site: 
>>> Seems to 1 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 



ave a cleavable 
Likelihood 
Likelihood 
Likelihood 
Likelihood 
Likelihood 
Likelihood 
Likelihood 



i = -7 



-2. 
1 = -0. 



rm signal seq. 
Transmembrane 
Transmembrane 



Transmembrane 261 



315 ( 296 - 

188 ( 169 - 

87 ( 66 - 

277 ( 260 - 

144 ( 127 - 

117 ( 101 - 



198 - 214 ( ] 



•-- Certainty=0 . 4121 (Affirmative) < suco 
bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 
bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAA60239 GB:X86525 pfoS [Clostridium perfringens] 
Identities = 96/147 (65%), Positives = 122/147 (B2%) 

Query: 100 GTGI I PG FLAGYL VGFLVKWMERNI PGGLDLISI 1 1 IGAPLTRLVAKLLTPLINSTLLTI 159 

G GI+PGF+AGYL F++K++E+ IP GLDLI II++GAPL R +A + PL+ +TL I 
Sbjct: 1 GFGILPGFIAGYLGSFVIKFLEKKIPAGLDLIVIIVLGAPLVRGIAAISNPLVETTLQNI 60 

Query: 160 GDILTSGAHSNPILMGIILGGTIVWATAPLSSMALTAMLGLTGMPMAIGALSVFGSSFM 219 

G ++T+ + ++PI+MGIILGG + WATAPLSSMALTAMLGLTG+PMAIGAL+VFGSSFM 
Sbjct: 61 GGVITATSTASPIMMGIILGGIVTWATAPLSSMALTAMLGLTGLPMAIGALAVFGSSFM 120 

Query: 22 0 NGVLFHKLKLGSRKDNIAFAVEPLTQA. 246 

N V F K+K GS+KD IA A+EPLTQA 
Sbjct: 121 NLVFFGKMKFGSKKDTIAVAIEPLTQA 147 

A related DNA sequence was identified in S.pyogenes <SEQ ID 4567> which encodes the amino acid 
sequence <SEQ ID 4568>. Analysis of this protein sequence reveals the following: 



Possible site: 37 
>> Seems to have a cleavable 
INTEGRAL Likelihood = -£ 
Likelihood 
Likelihood 
Likelihood 
Likelihood 
Likelihood 
Likelihood 



INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 



3 = -7 



a - -2 



cm signal seq. 

Transmembrane 303 - 319 ( 296 - 

Transmembrane 70 - 86 ( 66 - 

Transmembrane 172 - 188 ( 169 - 

Transmembrane 261 - 277 ( 260 - 278 

Transmembrane 101 - 117 ( 101 - 119' 

Transmembrane 124 - 140 ( 124 - 140 

Transmembrane 198 - 214 ( 197 - 215 



• Final Results 

bacterial membrane - 
bacterial outside - 
bacterial cytoplasm - 



■- Certainty=0. 4482 (Affirmative) 

- Certainty=0. 0000 (Not Clear) < : 

— Certainty=0.fl000(Not Clear) < : 
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The protein has homology with the following sequences in the databases: 

>GP:CAA60239 GB:X86525 p£oS [Clostridium perfringens] 
Identities = 95/147 (64%) , Positives = 123/147 (83%) 

Query: 100 GTGIIPGFVAGYWSFLIKWMEKNIPGGLDLISIIIVGAPLTRFLAQLITPVINSTLLTI 159 

G GI+PGF+AGY+ SF+IK++EK IP GLDLI II++GAPL R +A + P++ +TL I 
Sbjct: 1 GFGILPGFIAGYLGSFVIKFLEKKIPAGLDLIVIIVlfiAPBVRGIAAISNPIjVETTLQHI 60 

Query: 160 GDILTSSANSNPIIMGMILGGTIVWATAPLSSMALTAMLGLTGIPMAIGALSVFGSSFM 219 

G ++T+++ ++PI+MG+ILGG + WATAPLSSMALTAMLGLTG+PMAIGAL+VFGSSFM 
Sbjct: 61 GGVITATSTASPIMMGIILGGIVTWATflPLSSMALTAMLGLTGLPMAIGALAVFGSSFM 120 



Query: 220 NGVBFYRLKLGERKDNIAFAIEPLTQA 246 

N V F ++K G +KD IA AIEPLTQA 
Sbjct: 121 NLVFFGKMKFGSKKDTIAVAIEPLTQA 147 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 302/339 (89%) , Positives = 330/339 (97%) 

Query: 1 MNIIIGTSLLILVIAIFTLFNYKAPYGTKAMGAIASAACASFIjVEAFQDSFFGKVLGFQF 60 

M+IIIGTSLLILVLAIF+LFNYKAP+G KAMGALASAACASFLVEAFQDSFFGKVLGFQF 
Sbjct: 1 MDIIIGTSLLILVLAIFSLFOTKAPHGAKAMGAIASAA.CASFLVEAFQDSFFGKV1GFQF 60 

Query: 61 LSEVGGANGSLSGVAAAILVAIAIGVTPGYAVLIGLSVSGTGIIPGFLAGYLVGFLVKWM 120 

LSEVGGANGSLSGVAAAILVAIAIGV+PGYAVMGIjSVSGTGIIPGF+AGY+V FL+KWM 
Sbjct: 61 LSEVGGANGSLSGVAAAILVAIAIGVSPGYAVI.IGLSVSGTGIIPGFVAGYWSFLIKWM 120 

Query: 121 ERNIPGGLDLISIIIIGAPLTRLVAKLLTPLINSTLLTIGDILTSGAHSNPILMGIILGG 180 
■ E+NIPGGLDLISIII+GAPLTR +A+L+TP+INSTIjLTIGDILTS A+SNPI+MG+ILGG 
' Sbjct: 121 EKNIPGGLDL I S 1 1 1 VGAPLTRFLAQLITPVINSTLLTIGDILTSSANSNPI IMGMILGG 180 



Query: 181 TIvWATAPLSSMADTAMLGLTGMPMAIGALSVFGSSFMNGVLFHKT.KIGSRKDNIAFAV 240 
TIVWATAPLSSMRLTAMLGLTG+PMAIGALSVFGSSFMNGVLF++LKLG RKDNIAFA+ 
, Sbjct: 181 TIVWATAPLSSMALTZ^MLGLTGIPMAIGAQSVFGSSFMNGVLFYRLKLGERKDNIAFAI 240 

Query: 241 EPLTQADVTSANPIPIYVTNFVGGARCGILIALMKLVNDTPGTATPIAGFAvMFAYNPMI 300 

EPLTQADVTSANPIPIYVTOFVGGAACG+LIALMia.'SOT3TPGTATPIAGFAVMFAYMP+ 
Sbjct: 241 EPLTQADVTSANPIPIYVTNFVGGAACGV1IALMKLVNDTPGTATPIAGFAVMFAYNPVA 300 

Query: 301 KVLITALGCIILSLIAGYFGGIVFKDYKLVTKEELQARD 339 

KVLITALGCII4-SL+ GY GG VFK+Y+LVTK+ELQAR+ 
Sbjct: 301 KVLITALGCIIISLIVGYIGGSVFKNYRIjVTKQELQARN 339 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 



Example 1485 

A DNA sequence (GBSxl571) was identified in S.agalactiae <SEQ ID 4569> which encodes the amino 
acid sequence <SEQ ID 4570>. This protein is predicted to be adenylosuccinate synthetase (purA). Analysis 
50 of this protein sequence reveals the following: 

Possible site: 27 

»> Seems to have no N-terminal signal sequence 

Final Results 

55 bacterial cytoplasm Certainty=0. 0560 (Affirmative) < suoo 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database. 
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Query: 1 MTSWWGTQWGDEGKGKITDFLSADAEVIAEYQGGDNAGHTIVIDHKKFKLHLIPSGIF 60 

M+SVVWGTQWGDEGKGKITDFLS 4-AEVIftRYQGG+NAGHTI D +KLHLIPSGIF 
Sbjct: 1 MSSWWGTQWGDEGKGKITDFLSENAEVIflRYQGGNNAGHTIKFDGITYKLHLIPSGIF 60 

Query: 61 FKEKISVIGNGVVVMPKSLVKEIAYmGEGVTTDMLRISDRAHVILPYHIKLDQLQEEiaK 120 

+K+K VIGNG+W+PK+LV ELAYLH V+TDNLRI S +RAHVILPYH+KLD+ + +E+ K 
Sbjct: 61 YKDKTCTIGNGMVVDPKALVTEIAYLHERIWSTDNLRISNRAWILPYHLKIiDEVEEER^ 120 

Query: 121 GDNKIGTTIKGIGPAYMDKARRVGIRIADLLDREVFAERLKINLAEKNRLFEKMYDSTPL 180 

G NKIGTT KGIGPAYMDKAAR+GIRIADLLDR+ FAE+L+ NL EKNRL EKMY++ 
Sbjct: 121 GANKIGTTKKGIGPAYMDKAARIGIRIADLLDRDAFAEK-jERNLEEKNRLLEKMYETEGF 180 

Query: 181 EFDDIFEEYYEYGQQIKQYVTDTSVILNDALDAGHRVLF3GAQGVMLDIDQGTYPFVTSS 240 

+ +DI +EYYEYGQQIK+YV DTSV+LNDALD G+RVLFEGAQGVMLDIDQGTYPFVTSS 
Sbjct: 181 KLEDII^EYYEYGQQIKKOTCDTSWLNDAtDEGRRVLFEGAQGVMLDIDQGTYPFVTSS 240 

Query: 241 NPVAGGVTIGSGVGPSKINKWGVCKAYTSRVGDGPFPTELFDEVGDRIREIGKEYGTTT 300 

NPVAGGVTIGSGVGP+KI WGV KAYT+RVGDGPFPTEL DE+GD+IRE+G+EYGTTT 
Sbjct: 241 NPVAGGVTIGSGVGPTKIKHWGVSKAYTTRVGDGPFPTELKDEIGDQIREVGREYGTTT 300 

Query: 301 GRPRRVGWFDSWMRHSRRVSGITNLSLNSIDVLSGLDTVKICVAYDLDGKRIDYYPASL 360 

GRPRRVGWFDSW+RH+RRVSGIT+LSLNSIDVL+G++T+KICVAY G+ 1+ +PASL 
Sbjct: 301 GRPRRVGWFDSWVRHARRVSGITDLSLNSIDVLAGIETLKICVAYRYKGEIIEEFPASL 360 



Sbjct: : 

Query: 421 NILESVW 427 

N+L SV+ 
Sbjct: 421 NVLRSVY 427 

A related DNA sequence was identified in S.pyogenes <SEQ ID 457 1> which encodes the amino acid 
sequence <SEQ ID 4572>. Analysis of this protein sequence reveals the following: 

o N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 0560 (Affirmative) < succ? 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 406/430 (94%) , Positives = 421/430 (97%) 

Query: 1 MTSVVWGTQWGDEGKGKITDFLSADAEVIARYC^DNAGHTIVIDNKKFKLHLIPSGIF 60 

MTSWWGTQWGDEGKGKITDFLSADAEVIARYQGGDNAGHTIVID KKFKLHLIPSGIF 
Sbjct: 1 MTSWWGTQWGDEGKGKITDFLSADAEVIARYQGGDNAGHTIVIDGKKFKLHLIPSGIF 60 

Query: 61 FKEKISVIGNGVVVNPKSLVKEIAYLHGEGVTromjRISDRAHVILPYHIKLDQLQEDAK 120 

F +KI SVIGNGVWNPKSLVKEIjAYIjH EGVTTIJNLRI SDRAHV ILPYHI +LDQLQEDAK 
Sbjct: 61 FPQKISVIGNGVi/VNPKSLVKELAYLHDEC-VTTDNLRISDRAHVILPYHIQLDQLQEDAK 120 

Query: 121 GDNKIGTTIKGIGPAYI^KAARVGIRIADLLDFJ:VFAERLKINLAEKNR1.FEKMYDSTPL 180 

GDNKIGTTIKGIGPAYM3KAARVGIRIADLLD+++FAERL+IIl3LAEKNRIiFEKMYDSTPIi 
Sbjct: 121 GDNKIGTTIKGIGPAYI^KAARVGIRIADLLDKDIFAERLRINLAEKNRLFEKMYDSTPL 180 

Query: 181 EFDDIFEEYYEYGQQIKQYVTDTSVILNDALDAGKRVIjFEGAQGVMLDIDQGTYPFVTSS 240 

+FD IFEEYY YGQ+IKQYVTDTSVIIJSEALDAGKRVLFEGAQGVMLDIDQGTYPFVTSS 
Sbjct: 181 DFDAIFEEYYAYGQEIKQYVTDTSVILMAIjDAGKRVIjFEGAC^VMLDIDQGTYPFVTSS 240 

Query: 241 NPVAGGVTIGSGVGPSKINKWGVCKAYTSRVGDGPFPTELFDEVGDRIREIGKEYGTTT 300 
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NPVAGGVTIGSGVGP+KINKVVGVCKAYTSRVGDGPFPTELFDEVG+RIRE+G EYGTTT 
Sbjct: 241 NPVAGGVTIGSGVGPNKINKWGVCKa.YTSRVGDGPFPTELFDEVGERIREVGHEYGTTT 3 00 

Query: 301 GRPRRVGWFDSWMRHSRRVSGITNLSLNSIDVLSGLDTVKICVAYDLDGKRIDYYPASL 360 

GRPRRVGWFDSVVMRHSRRVSGITNLSLNSIDVLSGLDTVKICVAYDLDGKRIDYYPA+L 
Sbjct: 301 GRPRRVGWFDSWMRHSRRVSGITNLSLNSIDVLSGLDTVKICVAYDLDGKRIDYYPANL 360 



Query: : 

EQLKRCKPIYEELPGW edit RSLD+LPENARKYVRRVGELVGVRISTFSVGPGREQT 
Sbjct: 361 EQLKRCKPIYEELPGWQEDITGTOSLDELPENARKYVRRVGELVGVRISTFSVGPGREQT 420 

Query: 421 NILESVWSNI 430 

NILESVW++I 
Sbjct: 421 NILESVWASI 430 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 



Example 1486 

A DNA sequence (GBSxl572) was identified in S.agalactiae <SEQ ID 4573> which encodes the amino 
20 acid sequence <SEQ ID 4574>. Analysis of this protein sequence reveals the following: 

Possible site: 46 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -9.29 Transmembrane 30 - 46 ( 22 - 55) 
INTEGRAL Likelihood = -2.97 Transmembrane 110 - 126 ( 109 - 126) 
25 INTEGRAL Likelihood = -0.11 Transmembrane 89 - 105 ( 89 - 106) 

Final Results 

bacterial membrane — Certainty=0 .4715 (Affirmative) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

30 bacterial cytoplasm --- Certainty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 8823> which encodes amino acid sequence <SEQ ID 8824> 
was also identified. Analysis of this protein sequence reveals the following: 

Lipop Possible site: -1 Crend: 10 
35 SRCFLG: 0 

McG: Length of UR: 5 

Peak Value of UR: 3.05 
Net Charge of CR: 0 
McG: Discrim Score: 4.64 
40 . GvH : Signal Score (-7.5): -1.66 
Possible site: 36 
»> Seems to have a cleavable N-term signal seg. 
, Amino Acid Composition: calculated from 3 7 
■ ALOM program count: 2 value: -2.97 threshold: 0.0 
45 INTEGRAL Likelihood = -2.97 Transmembrane 100 - 116 ( 99 - 116) 

PERIPHERAL Likelihood = 1.38 56 
modified ALOM score: 1.09 
icml HYPID: 7 CFP: 0.219 

50 *** Reasoning Step: 3 

Final Results 

bacterial membrane Certainty=0 .2190 (Affirmative) < suco 



bacterial outside Certainty=0 . 0000 (Not Clear) . 

bacterial cytoplasm Certainty=0.0000(Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database and no 
corresponding DNA sequence was identified in S.pyogenes. 
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Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1487 

A DNA sequence (GBSxl573) was identified in S.agalactiae <SEQ ID 4575> which encodes the amino 
acid sequence <SEQ ID 4576>. Analysis of this protein sequence reveals the following: 



3 N- terminal signal 

- Final Results 

bacterial cytoplasm Certainty=0. 0967 (Affirmative) • 

bacterial membrane Certainty=0. 0000 (Not Clear) < : 

bacterial outside Certainty=0. 0000 (Not Clear) < i 



The protein has no significant homology with any sequences in the GENPEPT d 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 



Example 1488 

A DNA sequence (GBSxl574) was identified in S.agalactiae <SEQ ID 4577> which encodes the amino 
20 acid sequence <SEQ ID 4578>. This protein is predicted to be SgaT protein (sgaT). Analysis of this protein 
sequence reveals (he following: 



: 43 



3 N- terminal signal sequence 



-7.64 



INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 



Likelihood * 
Likelihood = - 
Likelihood = -6.58 
Likelihood = -6.48 
Likelihood = -5.79 
Likelihood = -5.52 
Likelihood = -4.78 
Likelihood = -2.97 
Likelihood = -0.69 
Likelihood = -0.16 



Transmembrane 237 



Transmembrane 138 



• Final Results 

bacterial 
bacterial outside 
bacterial cytoplasm 



• 457 ( 436 - 

- 360 ( 339 - 

- 419 ( 392 - 

- 253 ( 235 - 

- 121 ( 99 - 

- 154 ( 137 - 

- 34 ( 14 - 

- 381 ( 365 - 

- 57 ( 41 - 

- 176 ( 160 - 



-- Certainty=0. 4121 (Affirmative) ■ 
-- Certainty=0 . 0000 (Not Clear) < ! 
-- Certainty=0. 0000 (Not Clear) < .• 



40 The protein has homology with the following sequences in the GENPEPT database. 



Query: 11 PSQNILQNPAPFVGLLVLIGYLLLKKPLKDVFAGFIKATVGYLILNVGAGGLVNTPRPIL 70 

F ++ N +G++ +GY+LL+K + + G IK +G+++L G+G L +TF+P++ 
Sbjct: 3 0 FFNQVMTNAPLLLGIVTCLGYILLRKSVSVIIKGTIKTIIGFMLLQAGSGILTSTFKPW 89 

Query: 71 VAIAKKFNLEAAVIDPYFGLASANAKLETMG-FISVATTALLIGFGINILLVALRKVTKV 129 

+++ + + A+ D Y AS A ++ MG S A+L+ +NI V LR++T + 
Sbjct: 90 AKMSEVYGINGAISDTY ASMMATIDRMGDAYSWGYAvLLALALNICYVLLRRITGI 146 

Query: 130 RTLFITGHIMVQQAATISVFVLLDIPQLRNGFGAWAV GIICGLYWAVSSNMTVEAT 185 

RT+ +TGHIM QQA I+V + + G+ W 1+ LYW ++SNM + T 

Sbjct: 147 RTIMLTGHIMFQQAGLIAVTLFIF GYSMWTTIICTAILVSLYWGITSNMMYKPT 200 
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Query: 


186 


Sbjct: 


201 


Query: 


246 


Sbjct: 


261 


Query: 


306 


Sbjct: 


312 




366 


Sbjct: 


371 




426 


Sbj Ct : 


43 0 



Q +T G GF+IGHQQQFA W KVAPF GKKEE++++LKLP +LNIFHD +V++A +M 



+FFG IL G D + 



h +YILQT +F+V +FI+ QGVRMFV 



F +GIA ++ + LA 



A related DNA sequence was identified in S.pyogenes <SEQ ID 4579> which encodes the amino acid 
sequence <SEQ ID 4580>. Analysis of this protein sequence reveals the following: 



INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 



INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 



Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood - 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 



Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 



• Final Results 

bacterial membrane 
bacterial outside 
bacterial cytoplasm 



441 - 457 

344 - 360 



238 - 254 



138 - 154 
400 - 416. 



235 - 26i: 



Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 



-- Certainty=0. 5203 (Affirmative) ■ 
-- Certainty=0. 0000 (Not Clear) < : 
-- Certainty=0. 0000 (Not Clear) < i 



The protein has homology with the following sequences in the databases: 

>GP:AAC77150 GB-.AE000491 orf, hypothetical protein [Escherichia 

. Identities = 182/461 (39%) , Positives = 273/461 (50%) , Gaps = 25/461 (5%) 

Query: 1 MEPCiAPIJOTJFSQNILQNPAFFVGLLVLIGYLLLKKPIYEVFAGFVKATVGYLILNVGAG 60 

ME+L F ++ N +G++ +GY+LL+K + + G +K +G+++L G+G 

Sbjct: 20 MEILYNIFTVFFNQVMTNAPLLLGIVTCLGYILLRKSVSVIIKGTIKTIIGFMLLQAGSG 79 

Query: 61 GLVTTFRPILVAIAKKFELKAAVIDPYFGLAAANTKLEEMG-FISVATTALLIGFGVNIL 119 
L +TF+P++ +++ + + A+ D Y + A ++ MG S A+L+ +NI 
I LTSTFKP WAKMSEVYGINGAI SDTYASMMAT - - - IDRMGDAYSWVGYAVLLALAINIC 13 6 

: 120 LVALRKVTKVRTLFITGHIMVQQAATISVFVLLLIPQFQNAFGAWAV GIICGLYWA 175 

V LR++T +RT+ +TGHIM QQA I+V++ +W 1+ LYW 

; 137 YVLLRRITGIRTIMLTGHIMFQQAGLIAVTLFIF GYSMWTTI I CTAI LVSLYWG 190 

176 ISSNMTvFATQRLTGGGGFAIGHQQQFAIWFvDKVAPFFGKKEENLDNLKLPTFLNIFHD 235 

I+SNM + TQ +T G GF+IGHQQQFA W PCVAPF GKKEE++++LKLP +LNIFHD 
191 ITSNMMYKPTQEVTDGCGFSIGHQQQFASWIAYKVAPFLGKKEESVEDLKLPGWLNIFHD 250 



Sbjc 

Query; 

Sbjct; 



236 TWASATLMLVFFGAILAVLGPDIMSDVDLIGPGAFNPAKQAFFMYILQTSLTFSVYLFI 295 
+V++A +M +FFGAIL G D + + K + +YILQT +F+V +FI 

Sbjct: 251 NIVSTAIVMTIFFGAILLSFGIDTVQAM AGKVHWTVYILQTGFSFAVAIFI 301 
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Query: 296 LMQGVRMFVSELTNAFCGISSKLLPGSFPAVDVAASYGFGSSNAVLSGFAFGLIGQLITI 355 

+ QGVRMFV+EL+ AF GIS +L+PG+ A+D AA Y F + NAV+ GF +G IGQLI + 
Sbjct: 302 ITQGWMFVAELSEAFNGISQRLIPGAV1AIDCAMYSF-APNAWWGFMWGTIGQLIAV 360 

Query: 356 ALLVIFIQ^PILIITGFVPVFFDNAAIAVYADKRGGWKAaVALSFISGILQVALGAVAVGL 415 

4LV + ILII GF+P+FF NA I V+A+ GGW+AA+ + + G++++ AV L 

Sbjct: 361 GILVACGSS It 1 1 PGFI PMFFSNATIGVFANHFGGWRAALKI CLVMGMIE I FGCVWAVKL 420 

Query: 416 LGLTGGYHGNIDLVLPWLPFGYLFKFLGIAGYVLVCIFLLA 456 

G++ + G D + P F +GIA ++ + LA 
Sbjct: 421 TGMS -AWMGMADWS I LAPPMMQGFFS IGIAFMAVI IVIALA 460 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 437/476 (91%) , Positives = 457/476 (95%) 

Query: 1 MENFI^PLNWFSQNILQNPAFFVGLLVLIGYLLLKKPLHDVFAGFIKATVGYLILNVGAG 60 

ME LAPLNWFSQNILQNPAFFVGLU/LIGYLLLKKP+++VFAGF+KATVGYLILNVGAG 
Sbjct: 1 MEMLLAPLNWFSQNILQNPAFFVGLLVLIGYIiIiKKPIYEVFAGFVKATVGYLILNVGAG 60 

Query: 61 GLvWTFRPILVALAKKFNLEAAVIDPYFGLASANAKLETMGFISvATTALLIGFGINILL 120 

GLV TFRPILVALAKKF L+AAVIDPYFGIA+AN KLE MGFISVATTALLIGFG+NILL 
Sbjct: 61 GLVTTFRPILVAIAKKFELKAAVIDPYFGIiAAANTKLEEMGFISVATTALLIGFGVNILL 120 

Query: 121 VALRKVTKVRTLFITGHIMVQQAATISVFVLLLIPQLRNGFGAWAVGIICGLYWAVSSNM 180 

VALRKVTKVRTLFITGHIMVQQAATISVFVLLLIPQ +H FGAWAVGIICGLYWA+SSNM 
Sbjct: 121 VALRKVTKVRTLFITGHIMVQQAATISVFVLLLIPQFQNAFGAWAVGIICGLYWAISSNM 180 

Query: 181 TVEATQRLTGGGGFAIGHQQQFAIWFVDKVAPFFGKKEEKLDNLKLPTFLNIFHDTWAS 240 
TVmTQRLTGGGGFAIGHQQQFAIWFVDCTAPFFGKKEENLDNLKLPTFLNIFHDTWAS 
■ Sbjct: 181 TVEATQRLTGGGGFAIGHQQQFAIWFVDKUAPFFGKKEENLDNLKLPTFLNIFHDTVVAS 240 

Query: 241 ATLMLVFFGGILAVLGPDIMSNVKLIGPGAFVPTKQAFFMYILQTSLTFSVYLFILMQGV 300 

ATLMLVFFG ILAVLGPDIMS+V LIGPGAF E KDAFFMYILOTSLTFSVYLFIIMQGV 
Sbjct: 241 ATLMLVFFGAILAVLGPDIMSDVDLIGPaAFNPAKQAFFMYILQTSLTFSVYLFILMQGV 300 

Query: 301 RMFVTELTNAFQGISNKLLPGSFPAVDVAASYGFGSSNAVLSGFAFGLIGQLITIALLW 360 

RMFV+ELTNAFQGIS+KLLPGSFPAVDVAASYGFGSSNAVLSGFAFGLIGQLITIALLV+ 
Sbjct: 301 RMFVSELIWAFQGISSKLLPGSFPAVDVAASYGFGSSKAvTjSGFAFGLIGQLITIALLVI 360 

Query: 361 FKNPILIITGFVPVFFDNAAIAVYADKRGGWKAAVALSFISGIIQVALGAVAVGLLGLAG 420 

FKNPILIITGFVPVFFDNAAIAVYADKRGGWKAAVALSFISGI+QVALGAVAVGLLGL G 
Sbjct: 361 FKNP I LI ITGFVPVFFDNAAI AVYADKRGGWKAAVALS FI SGI LQVALGAVAVGLLGLTG 420 

Query: 421 GYHGNIDFEFPWIAFGYIFKYLGIAGYVIVCLFFLAIPQLQFMKSKDKEAYYRGDA 476 

GYHGNID PWL FGY+FK+LGIAGYV+VC+F LAIPQLQF K+KDKEAYYRG+A 
Sbjct: 421 GYHGNIDLVLPWLPFGYLFKFLGIAGYVLVCIFLLAIPQLQFAKAKDKEAYYRGEA 476 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1489 

A DNA sequence (GBSxl575) was identified in S.agalactiae <SEQ ID 4581> which encodes the amino 
acid sequence <SEQ ID 4582>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 

- Final Results 

bacterial cytoplasm Certainty=0. 1225 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside — Certainty=0.0000 (Not Clear) < suco 
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The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAG34743 GB:AE000033 similar to PTS system: EIIB [Mycoplasma pneumoniae] 
Identities = 40/89 (44%) , Positives = 62/89 (68%) , Gaps = 1/89 (1%) 

Query: 4 VLTACGNGMGSSIWIKMKVENALRQLGVSNFESASCSVGEAKGLaMIYDlVVASNHLIHE S3 

++ ACGNGMG+SM+IK+KVE +++LG + A S+G+ KG+ + DI+++S HL E 
Sbjct: 8 IIAACGNGMGTSMLIKIKVEKIMKELGYTAKVEA-LSMGQTKGMEHSADIIISSIHLTSE 66 

Query: 64 LDGRTKGHLVGLDNLMDDNEIKTKLQEIL 92 

+ K +VG+ NLMD+NEIK L ++L 
Sbjct: 67 FNPNAKAKI VGVIjNLMDENE I KQALS KVL 95 

A related DNA sequence was identified in S.pyogenes <SEQ ID 4583> which encodes the amino acid 
sequence <SEQ ID 4584>. Analysis of this protein sequence reveals the following: 

i N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .0977 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 85/92 (92%) , Positives = 90/92 (97%) 

Query: 1 rWKVLTACGNGMGSS^KMKVENALRQIX^SNFESASCSVGEAKGLAANYDIWASNHI, 60 

MVKVLTACGNGMGSSMVIKMKVENALRQLGV++ +SASCSVGEAKGLA+ YDIWASNHL 
Sbjct: 1 MVKVLTACGNGMGSS^IKMKVENALRQLGVTDIQSASCSVGESUCGLASGYDI VVASNHL 6 0 

Query: 61 IHELDGRTKGHLVGLDNLMDDNEIKTKLQEIL 92 

IHELDGRTKGHLVGLDNLMDDNEIKTKLQE+L 
Sbjct: 61 IHELDGRTKGHLVGLDNLMDDNE I KTKLQEVL 92 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1490 

A DNA sequence (GBSxl576) was identified in S.agalactiae <SEQ ID 4585> which encodes the amino 
acid sequence <SEQ ID 4586>. This protein is predicted to be a pentitol phosphotransferase enzyme ii, a 
component (ptxA). Analysis of this protein sequence reveals the following: 

5 N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 33 09 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC77152 GB:AE000491 putative PTS system enzyme II A component 
[Escherichia coli K12] 
Identities = 64/150 (42%) , Positives = 97/150 (64%) , Gaps = 2/150 (1%) 

Query: 1 MNLKQAFIENDS IRLKLSASDWKEAIKLS IDPLIESGAVDAEYYDAI IESTEEFGPYYIL 60 

M L+ + EN SIRL.+ A W+EA+K+ +D L+ + V+ YY AI++ E+FGPY+++ 
Sbjct: 1 MKlRDSIJffiNKSIRLQAEAETWQEAVKIGVDLLVAADVVEPRYYQAILDGVEQFGPYFVI 60 

Query: 61 MPGMAMPHARPEAGVKRDAFSLITLTEPWF--PDGKEVSVLLALAATSSAIHTSVAIPQ 118 
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PG+AMPH RPE GVK+ FSL+TL +P+ F ■ D V +L+ 4AA + H V I Q 
Sbjct: 61 APGLAMPHGRPEEGvKKTGFSLOTLKKPLEFNHDDNDPVDIM 120 

Query: 119 IIALFELENSIQELTECQEAKEVLAMVEES 148 
5 1+ LFE E + RL C+ +EVL +++ + 

Sbjct: 121 IVNLFEDEENFDRLRACRTEQEVLDLIDRT 150 

A related DNA sequence was identified in S.pyogenes <SEQ ID 4587> which encodes the amino acid 

sequence <SEQ ID 4588>. Analysis of this protein sequence reveals the following: 

10 Possible site: 42 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 2287 (Affirmative) < suco 

15 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 113/161 (70%) , Positives = 137/161 (84%) 

Query: 1 M^KQAFIENDSIRLKLSASDWKEAIKLSiaPLIESGAVDAEYYDAIIESTEEFGPYYIL 60 

MNLKQAFI+N+SIRIi LSA W+EA++L++ PLI+S AV + YYDAII STE++GPYY+L 
Sbjct: 1 MNLKCAFIDNNSIRLGIjSADTWQEAWIiAVQPLIDSKAVTSAYYDAIlASTEKVGPYYVL 60 

Query: 61 MPG^PHARPEAGVKRDAFSLITLTEPWFPDGKEVSVLLALAATSSAIHTSVAIPQII 120 

MPGMAMPHA GV R+AF+LITLT-f PV F DGKEVSVLt, LAAT + IHT+VAIPQI+ 
Sbjct: 61 MPG^IPHAEAGLGVNRNAFALITLTKPVTFSDGKEVSVLLTLAATDPSIHTTVAIPQIV 120 



Query: 121 ALFELENS1QRLTECQI 

ALFEL+N+I+RL CQ KEVL MVEES1 
Sbjct: 121 AIiFELDNAI ERLVACQSPKE VLEMVEE S 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or d 



35 Example 1491 

A DNA sequence (GBSxl577) was identified in S.agalactiae <SEQ ID 4589> which encodes the amino 
acid sequence <SEQ ID 4590>. This protein is predicted to be probable hexulose-6-phosphate synthase. 
Analysis of this protein sequence reveals the following: 

Possible site: 19 
40 »> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm --- Certainty=0. 1584 (Affirmative) < suco 
bacterial membrane — Certainty=0. 0000 (Not Clear) < suco 
45 bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC77153 GB:AE000491 probable hexulose- 6 -phosphate synthase 
[Escherichia coli K12] 
50 Identities = 108/217 (49%) , Positives = 141/217 (64%) , Gaps = 3/217 (1%) 



Query: 5 LPNLQVALDHSDLQGAIKAAVSVGHEVDVIEAGTVCLLQVGSELVEVLRSLFPDKIIVAD 64 

LP LQVALD+ + A + + EVD+IE GT+ + G V L++L+P KI++AD 
Sbjct: 3 LPMLQVALDNQTMDSAYETTRLIAEEVDIIEVGTILCVGEGVRAVRDLKALYPHKIVLAD 62 

Query: 65 TKCADAGGTOAKNNAVRGADWMTCICCATIPTMEAALKAIKEERGDRGEIQIELYGDWTY 124 

K ADAG +++ ADW+T ICCA I T + AL KE GD +QIEL G WT+ 

Sbjct: 63 AKIADAGKILSRMCFEMADWVTVICCMIOTAK^^ 119 
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Query: 125 EQAQQWLDAGISQAIYHQSRBMjIAGETWGEKDMKVKKLIDMC3FRVSVTGGLSTDTLQL 184 

EQAQQW DAGI Q +YH+SRDA AG WGE D+ +K+L DMGF+V+VTGGL+ + L L 
Sbjct: 120 EQAQQWRDAGIGQVVYHRSRDAQAAGVAWGEMITAIIOILSDMGFIWTVTGGLALEDLPL 179 

Query: 185 FEGVDVFTFIAGRGITEADDPAAAARAFKDEIKRIWG 221 

F+G+ + FIAGR I +A P AAR FK I +WG 
Sbjct: 180 FKGIPIHVFIAGRSIRDAASPVEAARQFKRSIAELWG 216 

A related DNA sequence was identified in S.pyogenes <SEQ ID 4591> which encodes the amino acid 
sequence <SEQ ID 4592>. Analysis of this protein sequence reveals the following: 

Possible site: 27 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1473 (Affirmative) < snco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 206/217 (94%) , Positives = 212/217 (96%) 

Query: 5 LPNLQVALDHSDLQGAIKAAVSVGHEVDVIKAGTVCLLQVGSELVEVLRSLFPDKIIVAD 64 

+PNLQVALDHSDLQGA+KAAV+VGHEVDVIEAGTVCLLQVGSELVEVLRSLFP+KIIVAD 
Sbjct: 4 IPNLQVALDHSDLQGAVKAAVAVGHEVDVIEAGTVCLLQVGSELVEVLRSLFPEKI1VAD 63 

Query: 65 TKCADAGGTVAKNNAVRGADWMTCICCATIPTMEAALKAIKEERGDRGEIQIELYGDViTY 124 

TKCADAGGTVAKNNA RGADWMTCICCATIPTMEAALKAIKEERGDRGEIQIELYGDWTY 
Sbjct: 64 TKCADAGGWAKI^AKRGADWMTCICOVTIPTMEAALKAIKEERGDRGEIQIELYGDWTY 123 

Query: 125 EQAQQWLDAGISQAIYHQSRDALIAGETMGEKDLNKVKKLIDMGFRVSVTGGLSTDTLQL 184 

EQAQ WLDAGISQAIYHQSRDALLAGETWGEKDLNKVK LIDMGFRVSVTGGL DTL-lL 
Sbjct: 124 EC^QLWLDAGISQAlYHQSRDALIAGETWGEKDR^KVlCrLIDMGFRVSOTGGLDTOTLRIi 183 

Query: 185 FEGVDVFTFIAGRGITEADDPAAAARAFKDEIKRIWG 221 

FEGVDVFTFIAGRGITEA+DPAAAARAFKDE IKRIWG 
Sbjct: 184 FEGVDVFTFIAGRGITEAEDPAAAARAFKDEIKRIWG 220 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 



Example 1492 

A DNA sequence (GBSxl578) was identified in S.agalactiae <SEQ ID 4593> which encodes the amino 
acid sequence <SEQ ID 4594>. Analysis of this protein sequence reveals the following: 

N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 4179 (Affirmative) < suco 

bacterial membrane — Certainty=0. 0000 (Not Clear) < suco 

bacterial outside — - Certainty=0.0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC22686 GB:U32783 hexulose- 6 -phosphate isomerase, putative 
[Haemophilus influenzae Rd] 
Identities = 143/282 (50%) , Positives = 199/282 (69%) , Gaps = 3/282 (1%) 

Query: 5 IGIYEKATPKHFNWLERLQFAKELGFDFVELSIDESDERLARLEWSKEERLELVKAIFET 64 

1GIYEKA PK+ W ERL AK GF+F+E+SIDES++RL+RL W+K ER+ L ++I ++ 
Sbjct: 6 IGIYEKALPKNITWQERLSLAPACC-FEFIEMSIDESNDRLSRLNlfTKSERIALHQSIIQS 65 
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Query: 


65 


Sbjct: 


66 


Query: 


125 


Sb j ct : 


126 


Sbjct: 


185 




245 


Sbjct: 


246 



• ++GIR IQLAGYDVYYE+4 



ATA AQV L++EIMD PFK+SI ++ + I+SP+ VYPD G 



NY G FLIEMW+E 



A related DNA sequence was identified in S.pyogenes <SEQ ID 4595> which encodes the amino acid 
sequence <SEQ ID 4596>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0 . 1489 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 240/286 (83%), Positives = 271/286 (93%) 

Query: 1 MTRPIGIYEKATPKHFNWLERl^FAKELGFDFVELSIDESDERLARLEWSKEERLELVKA 60 

M RPIGIYEKATPK F W ERLQFAK+LGFDFVE+S+DESD RIARLEW4KEERL+LVKA 
Sbjct: 15 MARPIGIYEKATPKQFTWRERLQFAKDLGFDFVEMSVDESDARLARLEWTKEERLDLVKA 74 

Query: 61 IFETGTOVPTITFSGHRRFPMGSNNPEKFJ^AMDMMKKCIVFAQDIGIRNIQLAGYDVYY 120 

I+ETG+R+PTI FSGHRR+P+GSN+P EA+++ +MK+CI AQD+G+R IQLAGYDVYY 
Sbjct: 75 IYETGIRIPTICFSGHRRYPLGSNDPAIEAKSLKLMKQCIELAQDLGVRTIQLAGYDVYY 134 

Query: 121 EEKSPETRARFIKNLRQACTWAEEAQVILSIEIKDDPFMNSIEKYLAVEKEIDSPYLFVY 180 

E+KSPETRARFIKNLRQ+C WAEEAQV+LSIEIMDDPF+NSIEKYLAVEKEIDSPYLFVY 
Sbjct: 135 EKKSPETRARFIKNLRQSCDWAEEAQVMLSIEIMDDPFINSIEKYLAVEKEIDSPYLFVY 194 

Query: 181 PDTGNVSAWHNDLWSEFYNGHRSIAALHII<DTYAVTETSKGQFRDVPFGQGCVDWEEMFA 240 

PD GIWSAWHNDLWSEFYNGH+SIAALH+KDTYAVTETSKGQFRDVPFGQGCVDW+E+FA 
Sbjct: 195 PDAGWSAWTOtDLWSEFYNGHKSIAALHLKDTYAVTETSKGQFRDVPFGQGCVDWQELFA 254 

Query: 241 VIKKTNYNGPFLIEIWSENCETVEETRAAIKEAQDFLYPLMEKTGV 286 

V+KKTNYNGPFLIEP/IWSENC+TVEET+AAIKEftQDFLYPL+EK G+ 
Sbjct: 255 VLKKTNYNGPFJjIEMWSENCDTVEETKAAIKEAQDFLYPIjIEKAGL 300 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1493 

A DNA sequence (GBSxl579) was identified in S.agalactiae <SEQ ID 4597> which encodes the amino 
acid sequence <SEQ ID 4598>. This protein is predicted to be L-ribulose 5-phosphate 4-epimerase. 
Analysis of this protein sequence reveals the following: 
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Final Results 

bacterial cytoplasm — Certainty=0. 2559 (Affirmative) < suco 

bacterial membrane Certainty=Q. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAD45716 GB:AF160811 L-ribulose 5-phosphate 4-epimerase 
[Bacillus stearothermophilus] 
Identities = 143/229 (52%), Positives = 176/229 (76%), Gaps = 2/229 (0%) 

Query: 5 LQEMRERVCFANI<SLPvHSLVKF™GNVSEVDREAGLIVIKPSGVDYDQLTPENMVVTDL 64 

L+E+++ V EAN LP + LV FTWGNVS +DRE GL+VTKPSGV YD+LT ++MW DIi 
Sbjct: 2 LEELKQAVLEANLQLPQYRLVTFTWGNVSGIDRERGLVVIKPSGVAYDKLTIDDMvWDL 61 

Query: 65 EGNIVEGDLNPSSDLPTHVQLYKAWPEVC-GIVHTHSTEAVGWAQAGRDIPFYGTTHADYF 124 

GN+VEGDL PSSD PTH+ LYK +P +GGIVHTHST A WAQAG+ IP GTTHADYF 
Sbjct: 62 TGNWEGDLKPSSDTPTHLWLYKQFPGIGGIVHTHSTWATVWAQAGKGIPALGTTHADYF 121 

Query: 125 YGPVPCARSLSEDEvNTAYEKETGSVIIEEFERRDLDPMAVPGIWRNHGPFTWGKDPAQ 184 

YG +PC R ++ +E+ AYE ETG VI E F R LDP+ +PG++V HGPF WGKDPA 
Sbjct: 122 YGEIPCTRPMTNEEIQGAYELETGKVITETF- -RFLDPLQMPGVLVHGHGPFAWGKDPAN 179 

Query: 185 AVYHSWLEEVAKMNRFTEQINPRVEPAPKYIMDKHYLRKHGPNAYYGQ 233 

AV+++WLEEVAKM T +NP +P + ++D+HYLRKHG NAYYGQ 
Sbjct: 180 AVHNAVVLEEVAKMAARTYMLNPNAKPISQTLLDRHYLRKHGANAYYGQ 228 

A related DNA sequence was identified in S.pyogenes <SEQ ID 4599> which encodes the amino acid 
sequence <SEQ ID 4600>. Analysis of this protein sequence reveals the following: 

Possible site: 14 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2257 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 207/234 (88%), Positives = 220/234 (93%) 

Query: 1 MAKSLQE^ERVCFANKSLPVHSLVKFTKGNVSEVDREAGLIVIKPSGVDYDQLTPENMV 50 

MAK+LQEMRERVC ANKSLP H LVKFTWGNVSEV RE G IVIKPSGVDYD LTPENMV 
Sbjct: 1 mKNLQEMRERVCAANKSLPQHGLWFTOGNVSEVCRELGRIVIKPSGVDYDLLTPENMV 60 

• Query: 61 VTDLEGNIVEGDLNPSSDLPTHVQLYKAWPEVGGIVHTHSTEAVGWAQAGRDIPFYGTTH 120 

VTDL+GN+VEGDI^PSSDLPTHV-fljYKAKPEVGGIVHTHSTERVGWAQAGRDIPFYGTTH 
Sbjct: 61 VTDLDGNWEGDLNPSSDLPTHVELYKAWPEVGGIVHTHSTEAVGWAQAGRDIPFYGTTH 120 

Query: 121 ADYFYGPVPCARSLSEDEvNTAYEKETGSVIIEEFERRDLDPMAVPGIWRNHGPFTWGK 180 

ADYFYGPVPCARSL++ EV+ AYE-t-ETG+VI+EEF +R LDPMAVPGIWRNHGPFTWGK 
Sbjct: 121 ADYFYGPVPCARSLTKAETOGAY3QETGM/ILEEFSKRGLDPMAVPGIVVRNHGPFTWGK 180 

Query: 181 DPAQAVYHSVVLEEVAKMNRFTEQINPRVEPAPKYirmKHYLRKHGPNAYYGQK 234 

P QAVYHSWLEEVA+MNR TEQINPRVEPAP+YIMDKHYLRKHGPNAYYGQK 
Sbjct: 181 TPEQAVYHSVVLEEVARMNRLTEQINPRVEPAPRYIMDKHYLRKHGPNAYYGQK 234 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 
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Example 1494 

A DNA sequence (GBSxl580) was identified in S.agalactiae <SEQ ID 4601> which encodes the amino 
acid sequence <SEQ ID 4602>. This protein is predicted to be transaldolase (tal). Analysis of this protein 
sequence reveals the following: 

5 Possible site: 45 

>» Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 4232 (Affirmative) < suco 

10 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 10149> which encodes amino acid sequence <SEQ ID 
1015O was also identified. 



15 The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAB98962 GB:U67S39 transaldolase [Methanococcus jannaschii] 
Identities = 124/214 (57%) , Positives = 157/214 (72%) 

Query: 19 MKYFLDTADVSEIRRLNRLGIVDGVTTNPTIISREGRDFKEVINEICQI VDGPVSAEVTG 78 
20 MK+FLDTA+V EI++ LG+VDGVTTNPT++++EGRDF EV+ EIC+IV+GPVSAEV 

Sbjct: 1 MKFFLDTANVEEIKKYAELGLVDGVTTNPTLVAKEGRDFYEWKEICEIVEGPVSAEV1S 60 

Query. 79 LTCDEMVTEARElAKWSPN\AA7KIPMTEEGIMVSQLSKEGIKTimLIFTVA0GLSAMK 138 
+ MV EARE+AK + N+V+K1PMT++G+ AV LS EGIKTNVTL+F+ Q L A K 
25 Sbjct: 61 TDAEGMVKEAREIAKLADNIVIKIPMTKDGMKAWILSAEGIKINVTLVFSPIjQALVAAK 120 

Query: 139 AGATFISPFVGRLEDIGTDAYALIRDLRHIIDFYGFQSEIIAASIRGLAHVEGVAKCGAH 198 

AGAT+4-SPFVGRL+DIG LI D+ I Y ' ++E+I AS+R HV AK GA 

Sbjct: 121 AGATWSPFVGRLDDIGHVGMKLIEDWKIYKNYDIKTEVIVASVRHPWHVLEAAKIGAD 180 

30 

Query: 199 IATIPDKTFASLFTHPLTDKGIETFLKDWDSFKK 232 

IAT+P LF HPLTD G+E FLKDWD + K 

Sbjct: 181 IATMPPAVMDKLFNHPLTDIGLERFLKDWDEYLK 214 



35 A related DNA sequence was identified in S.pyogenes <SEQ ID 4603> which encodes the amino acid 
sequence <SEQ ID 4604>. Analysis of this protein sequence reveals the following: 

Possible site: 15 

»> Seems to have no N-terminal signal sequence 

40 Final Results 

bacterial cytoplasm Certainty=0. 1902 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 



45 An alignment of the GAS and GBS proteins is shown below. 

Identities = 162/214 (75%) , Positives = 180/214 (83%) 

Query: 19 MKYFLDTADVSEIRRIiNRLGIVCGVTTNPTIISREGRDFKEVINEICQIVDGPVSAEVTG 78 
MK+FLDTA+V+ 1+ +N IiG-hVDGVTTNPTIISREGRDF+ VI EIC IVDGP+SAEvTG 
50 Sbjct: 1 MKFFLDTANVAAI KAINELG WDG VTTNPT 1 1 SREGRDFETVI KEI CDI VDGPI SAEVTG 60 



Query: 79 LTCDE^OT'EAREIAIOTSPNVVTOIP^1TEEGIAAVSQLSKEGIKTNVTLIFWAC^LSAMK 138 

LT D MV EAR IAKW NVWKIPMT EGL A + LSKEGIKTNVTLIFTV+QGL AMK 
Sbjct: 61 LTADAMVEFJffiSIAKWHDNVVVKIPMTTEGLKATNILSKEGIKIWTLIFTVSCjGLMAMK 120 

Query: 139 AGATFISPFVGRLEDIGTDAYALIRDLRHIIDFYGFQSEIIAASIRGLAHVEGVAKCGAH 198 

AGAT+ISPF+GRLEDIGTDAY LI DLR IID Y FQ+EIIAASIR AHVE VAK GAH 
Sbjct: 121 AGATYISPFIGRLEDIGTDAYQLISDLREIIDLYDFQAEIIAASIRTTAHVEAVAKLGAH 180 
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Query: 199 IATIPDKTFASLFTHPLTDKGIETFLKDWDSFKK 232 

1ATIPD FA + HPLT G++TF++DW SFKK 
Sbjct: 181 IATIPDPLFAKMTQHPLTTNGLKTFMEDWASFKK 214 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1495 

A DNA sequence (GBSxl581) was identified in S.agalactiae <SEQ ID 4605> which encodes the amino 
acid sequence <SEQ ID 4606>. Analysis of this protein sequence reveals the following: 

d N-terminal signal sequence 

Final Results 

bacterial cytoplasm --- Certainty=0 . 1263 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP-.CAB14129 GB-.Z99115 transcriptional regulator (Lad family) 
[Bacillus subtilis] 
Identities = 108/331 (32%) , Positives = 188/331 (56%) , Gaps = 12/331 (3%) 

■ Query: 6 TISDIANLVGVSKATOSYYMGOTK1<MSLQTKEKIRIAIKETGYQPSKIAQSLVTKNTRT 65 
TI D+A GVSK+TVS Y+NG +S + + 1+ AI E Y+PSK+AQ L K ++ 

Sbjct: 10 



Query: 126 DS VDPNHSF I ETLSNDRL - - VMVDRQAKD I KTOTVASDNKESTQI FLEKMQEAGYHD I YF 183 

++ N + + ++ +++DR+ D+K+DTV +DN+ T+ L+K+ GY D+ 
Sbjct: 130 NATGENKDVLRAFAEQQIPTILIDRKLPDLKLDTVTTDNRWITKEILQKVYSKGYTDVAIj 189 

Query: 184 VTYPIEG1STRELRYEGFKEWS - SNPDKLI I ITE -DGSTQRILDI IEHSEQKP 235 

T PI IS R R ++E+ S H + L+ + ED + L E EQK 

Sbjct: 190 FTEPISSISPRAERAAVYQEMASVQNVWGLTOLHEIDVKDKEQLKAELRSFHKEMPEQKK 249 

Query: 236 GFLM^GPTLLNFMKKLNQSTVSYPEDYGLGSYEDLEWMQVLTPNVSCIKQDSYGIGCIiA 295 

L +NG +L + + + + P+D G+ ++D EW +++ P ++ I Q S+ +G A" 
Sbjct: 250 AIIALNGLIMLKIISCMEELGLRIPQDIGIAGFDDTEWYPCLIGPGITTIAQPSHDMGRTA 309 

Query: 296 AQCLIEKISQGNEPTTARLLEVKNQIVIRQS 326 

+ + + +E++ ++++R+S 

Sbjct: 310 MERVLKRIE- -GDKGAPQTIELEAKVIMRKS 338 

There is also homology to SEQ ID 2366. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1496 

A DNA sequence (GBSxl582) was identified in S.agalactiae <SEQ ID 4607> which encodes the amino 
acid sequence <SEQ ID 4608>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 
■ Final Results 
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bacterial cytoplasm Certainty=0. 1661 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certaiixty=0 .0000 (Not Clear) < suco 

5 The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1497 

10 A DNA sequence (GBSxl583) was identified in S.agalactiae <SEQ ID 4609> which encodes the amino 
acid sequence <SEQ ID 4610>. This protein is predicted to be GLYCERATE DEHYDROGENASE. 
Analysis of this protein sequence reveals the following: 

n signal seq 



Final Results 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB50351 GB:AJ24B287 GLYCERATE DEHYDROGENASE [Pyrococcus abyssi] 
Identities = 123/325 (37%) , Positives = 192/325 (58%) , Gaps = 8/325 (2%) 

MDKKKILVTGIVPKEGLRKLMDRPDVTYSED-EPFSRDYVLEHLSEYDGWLLM-GQKGDK 58 
M K ++ +T +P+ G+ L F+V ED R R+ +LE + + D + M ++ D+ 
MSKPRVFITREIPEVGIEMLEKEFEVBVWEDEREIPREILIjEKVKDVDALVTMIiSERIDR 60 

EMIDAGENLQIISLNAVGFDHVDTAYAKEKGIIVSNSPQAVRVPTAEMTFALIIjAaSKRL 118 
E+ + L+I++ AVG+D++D A ++GI V+N+P + TA++ FAL+LA ++ L 
EVFERAPRLR1 VANYAVGYDNI DVEEATKRG I YVTNTPGVLTDATADLAFALLLATARHL 120 

AFYDS I VRSGEW IDPSEQRYQGLTLQGSTLGIYGMGRIGLTVANFAKAFGMTWYN 174 

D RSGEW + + + G + G T+GI G GRIG +A A+ F M ++Y 
VKGDKFTRSGEWKKRGVAWHPICWFLGYDVYGKTIGIIGFGRIGQAIABCRARGFDMRILYY 180 



MK + LIN 





1 




1 




59 


Sbjct: 






119 


Sbjct: 




Query: 


175 


Sbjct: 


181 


Query: 


235 


Sbjct: 


240 




295 


Sbjct: 


300 



+ALI+ALKEG IAGAGLDV+E EP +E L SLDNV+++PH G+ T 



+A+ A+N+IAF G+ P +VN+ 



There is also homology to SEQ ID 124. 

50 Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 
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Example 1498 

A DNA sequence (GBSxl585) was identified in S.agalactiae <SEQ ID 461 1> which encodes the a 
acid sequence <SEQ ID 4612>. Analysis of this protein sequence reveals the following: 



I-terminal signal sequence 



• Final Results 

bacterial cytoplasm - 
bacterial membrane - 
bacterial outside - 



■ Certainty=0. 1898 (Affirmative) succ: 
• Certainty=0. 0000 (Not Clear) < suco 

■ Certainty=0. 0000 (Not Clear) < suco 



The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 



vaccines or 



Example 1499 

A DNA sequence (GBSxl586) was identified in S.agalactiae <SEQ ID 4613> which encodes the amino 
acid sequence <SEQ ID 4614>. This protein is predicted to be PTS system, galactitol specific IIC 
component. Analysis of this protein sequence reveals the following: 



Possible site: 25 
»> Seems to have no N-terminal 
INTEGRAL Likelihood 
INTEGRAL Likelihood 
INTEGRAL Likelihood 
INTEGRAL Likelihood 
INTEGRAL Likelihood 
INTEGRAL Likelihood 
INTEGRAL Likelihood 
INTEGRAL Likelihood 
Likelihood 



254 - 270 ( : 



Transmembrane 



00 

■ Final Results 

bacterial membrane - 
bacterial outside - 
bacterial cytoplasm - 



342 - 358 ( 342 - 359', 
308 - 324 ( 308 - 324! 



• Certainty=0 . 6307 (Affirmative; 
- Certainty=0. 0000 (Not Clear) • 
■ Certainty=0 . 0000 (Not Clear) • 



A related GBS nucleic acid sequence <SEQ ID 8825> which encodes amino acid sequence <SEQ ID 8826> 
was also identified. Analysis of this protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 9 
MCG: Discrim Score: 8.30 
GvH: Signal Score (-7.5): 2.97 

Possible site: 58 
>» Seems to have a cleavable N-term signal seq. 
ALOM program count: 9 value: -13.27 threshold: 



INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 



modified ALOM 



Likelihood =-1 
Likelihood = - 
Likelihood = - 
Likelihood = - 
Likelihood = - 
Likelihood = - 
Likelihood = - 
Likelihood = - 
Likelihood = - 
Likelihood = 
score: 3.15 



Transmembrane 321 - 



24 



Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 



Transmembrane 375 



337 ( 312 - 

ISO ( 138 - 

450 ( 431 - 

115 ( 93 - 

269 ( 249 - 

241 ( 218 - 

362 ( 343 - 

425 ( 409 - 

391 ( 375 - 
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Final Results 

bacterial membrane --- Certainty=0. 6307 (Affirmative) < suco 
5 bacterial outside --- Certainty=0 . 0000 {Not Clear) < suco 
bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAB03909 GB:AP001507 PTS system, galactitol-specif ic enzyme II, 
10 C component [Bacillus halodurans] 

Identities = 92/347 (26%) , Positives = 173/347 (49%) , Gaps = 15/347 (4%) 

Query: 1 MVKTTGLHLPIVDIGWQAGSLTAFSSEIGLSFEVFGLLIELGLFLLGITRVFVPSNLWNN 60 
MV G+ L ++D+GW A S A++S + GL++ + + + T+ + ++WN 

15 Sbjct: 70 MVDRLGVDLNVIDVGWPATSSIAWASWAAFIIPLGLIVWIMLVTKTTKT-MNVDIWNF 128 

Query: 61 FGYMIWGT^YAATGNFILSFAFMVFVLLYSLVMSEVLADRWSEYYGVKNATINSIHNIE 120 

+ Y + Y + + I + V + +L +++ A SE+Y + +1 + I 

Sbjct: 129 raYTFMAAWYTVSDSIIQALIAAVMFQIVALKVADWTAPMVSEFYELPGVSIATGSTIS 188 

20 

Query: 121 TLIPALILDPLVmLLGWKVKLNPESLKTKDGIFGEPMTIiGFILGVIIGVLGSLRlJIASI 180 

+4- + + G+ +P++++ + GIFGE + +G ILG IG+L 
Sbjct: 189 YAPGIWLWGIQKIPGIKHWNADPDTIQRRFGIFGESIFIGLILGAAIGLLAGYNV 244 

25 Query: 181 DTWGGILGFAVALAAVMTIFPLITGVFASAFAPLAEAVERNKKKESQAEQGALDKKRWFI 240 
G ++ 4A+AAVM + P 4 + P++E+ K +1 
Sbjct: 24S GEVIEIGMAMAAVMVLMPRMVICILMSGLMPVSESAREWLiNlCR FGDREIHI 294 

Query: 241 AVDDGVGFGEPATIIAGLILVPIMWISLILPGNEALPWDLIAIPFMIEAMIAVSKGNI 300 
30 +D V G P+ I LILVP+ V++++ILPGN LP DL IPF++ ++ ++GNI 

Sbjct: 295 GLDAAVLLGHPSVISTALILVPLT^/IjLAVILPGNALLPFGDLATIPFIVAFIVGAARGNI 354 

Query: 301 LKAILNGIIWFSLGLYAASALGPIYTEAVKHFGTALPAGVTLIMSFN 347 
+ ++L G I +L LY A+ + P++T+ ++ +P G LI S + 
35 Sbjct: 355 IHSVLAGAIMIALSLYMATDIAPVFTKMAENSNFNMPEGSALISSID 401 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1500 

40 A DNA sequence (GBSxl587) was identified in S.agalactiae <SEQ ID 4615> which encodes the amino 
acid sequence <SEQ ID 4616>. Analysis of this protein sequence reveals the following: 

Possible site: 52 

>» Seems to have no N-terminal signal sequence 

45 Final Results 

bacterial cytoplasm --- Certainty=0. 1013 (Affirmative) < 3ucc> 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside — Certainty=0. 0000 (Not Clear) < suco 

50 The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 
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Example 1501 

A DNA sequence (GBSxl588) was identified in S.agalactiae <SEQ ID 4617> which encodes the amino 
acid sequence <SEQ ID 4618>. Analysis of this protein sequence reveals the following: 

2 N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0. 1294 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 101 47> which encodes amino acid sequence <SEQ ID 
10148> was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC76604 GB-.AE000435 L-xylulose kinase, cryptic [Escherichia 
coli K12] 

Identities = 156/496 (31%) , Positives = 261/496 (52%) , Gaps = 18/496 (3%) 

Query: 16 YYLSIDVGGTNTKALIFDKLGHQIAVSSFErLKNETQSGHRQVNLVKTWNAITSAIREVl 75 

Y+L +D GG+ KA ++D+ G+V QG+++ + W +IR++ 

Sbjct: 4 YWLGLDCGGSWLKAGLYDREGREAGVQRLPLCALSPQPGWAERDMAELWQCCMAVIRALL 63 

Query: 76 QISKLSPEQISAVACIGHGKGLYLLDNKLEPLEQGILSTDNRAKDIAQYFESK- -LDNIW 133 

S +S EQI + GKGL+LLD +PL ILS+D RA +4 + ++ + ++ 

Sbjct: 64 TIISGVSGEQIVGIGISAQ^KGLPLLDKNDKPLGNAILSSDRRAMEIVRRWQEDGIPEKLY 123 

Query: 134 ELTRQHIFPSQSPVILRWLKDYQPETYKSIGAVLSAKDFIRYKLTGKVQQEYGDASGNHW 193 

LTRQ ++ +LRWLK+++PE Y IG V+ D++R+ LTG E + S ++ 

Sbjct: 124 PLTRQTLWTGHPVSLI.RWLKEHEPERYAQIGCVMMTHDYLRWCXTGVKGCEESNISESNL 183 

Query: 194 INFQTGTYDPAILDFFGIREIENSLPELIDSADLVPGGIS3QAAKETGLVEGTPWGGLF 253 

N G YDP + D+ GI EI ++LP ++ SA++ G I++Q A TGL GTPWGGLF 
Sbjct: 184 YNMSLGEYDPCLTDWLGIAEINHALPPWGSAEIC-GEITAQTAALTGLKAGTPVVGGLF 242 

Query. 254 DIDACALGSGVLESDTFSVISGTWNINT--YPSI.KPAKQDSGLMTSYFPDRRYLLEASSP 311 

D+ + AL +G+ + T + + GTW + + L+ + + Y D +++ +SP 

Sbjct: 243 DWSTALCAGIEDEFTUStAVMGTWAVTSGITRGLRDGFJUlPYVYGRYVNDGEFIVHEASP 3 02 

Query: 312 

TS+GNL 

Sbjct: 3 03 



Query: 372 ASACFFGLTTKSTKSQMIRAVYEGIAFAHKQHITDLIKSRGSVPKIIRFSGGATKTSPAWM 431 

++ F+G+ T++ +++A+YEG+ F+H H+ + ++ R + +R +GG +S WM 
Sbjct: 352 MTSGFYGMQAIHTRAHLLQAIYEGWFSHMTHri-I^RERFTDVHTLRVTGGPAHSDVWM 410 

Query: 432 QMFSDIENFPIETVEGTELGGLGGAILARHALDKI-SLKEAVQDMVRVKAIYKPQLSEVK 490 
QM +D+ IE + EG G A+ AR + EA +D4 P ++ + 

Sbjct: • 

Query: 491 GYKKKYHAYQKLLETL 506 

Y+KKY YQ L+ h 
Sbjct: 471 LYQKKYQRYQHLIAAL 486 



55 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 
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Example 1502 

A DNA sequence (GBSxl589) was identified in S.agalactiae <SEQ ID 4619> which encodes the amino 
acid sequence <SEQ ID 4620>. Analysis of this protein sequence reveals the following: 

Possible site: 59 

i» Seems to have an uncleavable N-term signal seq 



Final Results 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 



Query: 23 QVQLIKLVKDLGFSRFEIRQELLQDPDRELPALKaEADFYDINLYYSANEDLIK-GGKVN 81 

Q + L+ G R E+R+EL P + KL A + +S+ +L + G++N 

Sbjct: 23 QASFLPLLAMAGAQRVELREELFAGPP-DTEALTAAIQLQGLECVFSSPLELWREDGQLN 81 

Query: 82 PYLNKGLKEASQLGAPFIKLNVGQTRNLSKEELEPLKEILKSQTIGIKVENNQDPKAATV 141 

PL L+ A GA ++K+++G + +L L L + + VEN+Q P+ + 

Sbjct: 82 PELEPTLRRAEACGAGWLKVSLGLLPE--QPDLAALGRRLARHGLQLLVE1SIDQTPQGGRI 139 

Query: 142 ENCQYFMTLVKELQIPISFVFDTANWAF1NQDLYQAVNNLACDTTYLHCKNFIQVAGKPH 201 

E + F h + Q+ ++ FD NW + Q +A h Y+HCK 1 + 

Sbjct: 140 EVLERFFRLAERQQLDLAMTFDIG^roRWQEQAMEAALRLGRWGYVHCKAVIRNRDGKL 199 

Query: 202 LSKSLFEGE INLTD - LLKS FSNCEYLALEYPTE LEILKRDVQRLISISNSQ 251 

++ ++ LL+ F A+EYP + L + +R+L + Q 

Sbjct: 200 VAVPPSAADLQYMQRLLQHFPEGVARAIEYPLQGDDLLSLSRRHIAALARLGQPQ 254 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens f 



Example 1503 

A DNA sequence (GBSxl590) was identified in S.agalactiae <SEQ ID 4621> which encodes the amino 
acid sequence <SEQ ID 4622>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0. 0430 (Affirmative) < succ 

bacterial membrane Certainty=0 . 0000 (Not Clear) e suco 

bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 



Query: 4 rjDKKSYDLLFYLLKLEEPETVMAIANALNQSRRKVYYHIjEKINDALPSDVPQIVSYPRV- 62 

IiD++S +L tiL + + LN SRR VY LEKIN Ii + V R 

Sbjct: 3 LDQRSTFILTQLLHARSYLPIQELTQKLNVSRRTVYNDLEKINSWLEEQGLKAVYKVRSQ 62 

Query: 63 GILLTEKQKAACRLLLDEVTDYSYVMKSSERLQIiSIjVSIWAKDRVTIDRLMQLNDVSRN 122 

G++L E+ K L + + Y + ER ++ ++ + + +4- LM VSRN 

Sbjct: 63 GLILDERAKEEIPTKLRSLKSWHYEYSAQERKAWWIYLLTRLEPLFLEHLMDRTGVSRN 122 
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Query: 




Sb 3 ct: 




Query: 




Sb 3 ct: 


181 


Query: 


239 


Sbjct: 


236 




294 


Sbjct: 


289 


Query: 


352 


Sb j ct : 


349 


Sbjct: 






471 


Sbjct: 


467 



T ++D+ L+ EL 



• -KEYEAAKVMSFKLEQAFGVHYPDEEVLYLT 2 



YR +YG+ N + E IK Y ELF +T V LE+ + D++VA++T+H G + 

FYRIKYGLE^miAESIKTSYPELFLLTRKWHYLERYVGKSVNDNKVAFITMHFVGWM 408 

RNSQQSPNK-LKLVIVSDEGIAIQKLLLKQCQRYLTNSDIEAVFTTEQYQSVSDLMHVDM 470 
R P K K +IV G+ + L Q + DI + +Y+ + VD 

RREGTIPTKRKKALIVCANGVGTSQFLKNQLEGLFPAVDIIKTCSIREYEKTP- -VEVDF 466 

WSTSDALESRFPMLWHPVLTDDDIIRMR 501 
++ST+ E P+ +V+P+LT+ + RL++ 
IISTTSIPEKNVPIFIVNPILTETEKERLLK 497 

A related DNA sequence was identified in S.pyogenes <SEQ ID 4623> which encodes the amino acid 
sequence <SEQ ID 4624>. Analysis of this protein sequence reveals the following: 

o N-terminal signal sequence 

Final Results 

bacterial cytoplasm --- Certainty=0 . 0745 (Affirmative) < suco 

bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) t suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 368/548 (67%), Positives = 456/548 (83%) 

Query: 1 MIILDKKSYDLLFYLLKlEEPETVmiANAI^QSRRKVYYHLEKINDALPSDVPQIVSYP 60 

M+ILDKKSYDLL YLLKLE PETVMAI++ALNQSRRKVYY L+KIN ALP V QI+SYP 
Sbjct: 1 MMIIDICKSYDLLSYLLKLETPETVI^ISHALNQSRRKVYYQLDKINQALPKGVDQIISYP 60 

Query: 61 RVGILLTEKQKAACRLLLDEVTDYSYVMKSSERLQLSLVSIWAKDRVTIDRLMQLNDVS 120 

R+GILLT QKAACRLLL+EVTDY+YVMKS ER +LS + I V+ +RVTID+LMQ+NDVS 
Sbjct: 61 RLGILLTADQKAACRLLLEEVTDYNYVMPSDERRRLSSIYIAVSTERVTIDKLMQINDVS 120 

Query: 121 ROTILNDLNELRSELAEPCEYNLQLQSTKCRGYFLDGHPIjSIIQYLYKLLDDIYHNGSSSF 180 

RNTILNDL ELR EL +K+Y +QL +TK RGY+ HP+++IQYLYKLL D+Y G++SF 
Sbjct: 121 RNTIIMLTELREELEDKQYKIQLHATKARGYYFGCHPMALIQYLYKLLVDVYQGGNTSF 180 

Query: 181 IDLFNHKLSQAFGASTYFSKEVLDYFHHYLFISQRSLGKKINSQDGQFMIQILPFILMAY 240 

ID+FN KLS+ G S YFSK++L YFH YLF+SQ SLGK IN+QD QFM+QILPF+L++Y 
Sbjct: 181 IDIFNRKLSEIQGLSVYFSKDILTYFHEYLFLSQASLGKTINTQDSQFMLQILPFMLLSY 240 

Query: 241 RKMRLSPEVQTSMSDFSLWQRKEYEIAKFJjADELEFJSIFQLSLDEIEVGLVAMIjMLSFR 3 00 

R MRL E +++L +F L+W+RKEY IA++LA EL NF+L LD+IEV +VAMLMLSFR 
Sbjct: 241 RNMRLDSETKSALKQEFHLIWKRKETYHIAQDLARELYHNFKLHLDDIEVSMVAMLMLSFR 300 

Query: 301 KDRDNHLESQDYDDMRATLTSFLICELEERYELHF 1 7HKKDLLRQLLTHCKALLYRKRYGIF 360 

KD+D+H+ESQDYDDMRAT++ F+ +LE RY LHF HK+DLL++L THCKAL+YRK YGIF 
Sbjct: 3 01 KDQDHHVESQDYDDMRATISHFIDQLESRYQLHFTHKQDLLKRLTTHCKALVYRKAYGIF 360 
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361 


Sb j ct : 


361 


Query: 


421 






Query: 


481 


Sbjct: 


481 




541 


Sbjct: 


541 



VNPLT+H+K+KYEELFA+T S +LE4- W I LTDDD+AYLTIHLGGELR++ 



KLVIVSD+GI IQKLL KQCQRYL N IEAVFTTEQYQSV DL+ VDM+V+T+D L++ 



• PML+V+P+L+DDDII+LIRFSK+G + ++F+ EL K I VK++S+RY L SKIE 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1504 

A DNA sequence (GBSxl591) was identified in S.agalactiae <SEQ ID 4625> which encodes the amino 
acid sequence <SEQ ID 4626>. Analysis of this protein sequence reveals the following: 



Final Results 

bacterial cytoplasm --- Certainty=0. 2692 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC77149 GB-.AE000491 orf , hypothetical protein [Escherichia coli K12] 
Identities = 211/363 (58%), Positives = 270/363 (74%), Gaps = 9/363 (2%) 

Query: 1 MPNVKDITRESWILSTFPEWGTWLNEEIEEEWAEGNFAMWWLGNCGVWIKTPGGANVVM 60 

M VK ITRESWILSTFPEWG+WLNEEIE+E VA G FAMWWLG G+W+K+ GG NV + 
Sbjct: 3 MSKVK3ITRESWILSTFPEWGSWLNEEIEQEQVAPGTFAMWWLGCTGIWLKSEGGTNVCV 52 

Query: 61 DLWSNRGKSTKK\nu^MVRGHQMANI^GTOKLQPNLRAQPMVIDPFAINEIjDYYLVSHFHS 120 

D W GK + M +GHQM MAGV+KLQPNLR P V+DPFAI ++D L +H H+ 

Sbjct: 63 DFWCGTGKQSHGNPLMKQGHQMQRmGVKKLQPNLRTTPFVLDPFAIRQIDAVLATHDHN 122 

Query: 121 DHIDINTAAAIINNPNLDHVKFVGPYECGEIVIKKI'JGVPEERIIVIKPGESFEFKDIKVTA 180 

DHID+N AAA++ N D V F+GP C ++W WGVP+ER IV+KPG+ + KDI++ A 
Sbjct: 123 DHIDVNVAAAVMQNC-ADDVPFIGPKTCTOLYnGWGVPKERCIWKPGDWKVKDIElHA 181 

Query: 181 VESFDRTCLVTLPVDGAEEHDGEIAGIAVTDEEMARKAVNYIFETPGGTIYHGADSHFSN 240 

+++FDRT L+TLP D + AG V + M +AVNY+F+TPGG++YH DSH+SN 

Sbjct: 182 LDAFDRTALITLEADQ- KRAG--VLPDGMDDRAVNYLFKTPGGSLYHSGDSHYSN 233 

Query: 241 YFAKHGKDYKIDVAINNYGDNPVGIQDKMTS IDLiLRMAENLRAKVI I PVHYDIWSNFMAS 300 

Y+AKHG +++IDVA+ +YG+NP GI DKMTS D+LRM E L AKV+IP H+DIWSNF A 
Sbjct: 234 YYAKHGNEHQIDVALGSYGENPRGITDfflTSADMLRMGEALNAICWIPFHHDIWSNFQAD 293 

Query: 301 TDEILQLWKMRKERLQYDFHPFIWEVGGKYTYPQDKDRIEYHHPRGFDDCFEQESNIQFK 360 

EI LW+M+K+RL+Y F PFIW+VGGK+T+P DKD EYH+PRGFDDCF E ++ FK 
Sbjct: 294 PQEIRVLWEMKKDRLKYGFKPFIWQVGGKFTWPLDKDNFEYHYPRGFDDCFTIEPDLPFK 353 

Query: 361 ALL 363 
+ L 

Sbjct: 354 SFL 356 
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A related DNA sequence was identified in S.pyogenes <SEQ ID 4627> which encodes the amino acid 
sequence <SEQ ID 4628>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=D. 3298 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Mot Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 315/363 (86%), Positives = 348/363 (95%) 



MPNVKDITRESWILSTFPEWGTWLNEEIEEEWAEGNFA^mGNCGVWIKTPGGANVVM 60 
M V+DITRESWIL+TFPEWGTWLNEEIE+EW NFAMWWLGNCG+WIKTPGGANWM 
MTKVQDlTRESWIIm , FPEWGTWI^EIEQEWPAD^FA^m T LGNCGIVIIKTPGGAWVM 60 



Sbjct: 

: Sbjct i 

Sbjct: 

Sbj 
Sbj c\ 

Query: 301 TDEILQLWKMRKERLQYDFHPFIWEVGGKYTYPQDKDRIEYHHPRGFDDCFEQESNIQFK 360 

TDEIL+LWKMRKERLQYDFHPFIWEVGGKrTYPQD-i-+RIEYHHPRGFDDCF ++SNIQFK 
Sbjct: 301 TDEILELWKMRKERLQYDFHPFIWEVGGKYTYPQDQNRIEYHHPRGFDDCFLEDSNIQFK 360 

Query: 3S1 ALL 363 
ALL 

Sbjct: 361 ALL 363 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1505 

A DNA sequence (GBSxl592) was identified in S.agalactiae <SEQ ID 4629> which encodes the amino 
acid sequence <SEQ ID 4630>. Analysis of this protein sequence reveals the following: 



61 DLWSNRGKSTKK\7KD^^WGHQMANMAGVRICLQPNLRAQPMVIDPFAINELDYYLVSHFHS 120 

DLWSNRGK+TK+VKDMVRGHQMANMAG RKLQPNLRAQPMVIDPF INELDYYLVSH+HS 
61 DLMSNRGKATKQVKDMWGHQMANMAGARKLQPNLRAQPMVIDPFMINELDYYLVSHYHS 120 

121 DHIDIOTAAAIIl^NLDHVKFVGPYECGEIWKKWGVPEERIIVIKPGESFEFKDIKVTA 180 

DHIDINTAAAIINNP L+HVKFVGPYECGE+WK WGVP++RI+++KPG+SFEFKDIK+TA 
121 DHIDINTAAAI INNPKLNHVKFVGPYECGEVWKNWGVPKDRIMILKPGDSFEFKDIKITA 180 

181 VESFDRTCLVTLPVDGAEEHDGELAGLAVTDEEMARKAVNYIFETPGGTIYHGADSHFSN 240 

VESFDRTCLVTLP+ GA+ DG+LAGLA+TD++MARKAVNYIFETPGGT1YHGADSHFSN 
181 TOSFDRTCLVTLPIC<3ADAQDGDIAGIAITDDDMARKAVNYIFETPGGTIYIIGADSHFSN 240 

241 YFAKHGKDYKIDVAINNYGDNPVGIQDKMTSIDLLRMAENLRAKVI I PVHYDIWSNFMAS 300 

YFAKHG+DY IDV +1^G4-HP+GIQDKMTS+DLI J R^EOTJU«V+IPVHYDIWSHFMAS 
241 YFAKHGRDYDIDVVLNNYGENPIGIQDKMTSVDLLRMAE 300 



Final Results 

bacterial cytoplasm Certainty=0. 3988 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 10145> which encodes amino acid sequence <SEQ ID 
10146> was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 
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>GP:BAA18808 GB:D90917 hypothetical protein [Synechocystis sp.] 
Identities = 358/785 (45%) , Positives = 494/785 (62%) , Gaps = 15/785 (1%) 

Query: 22 LEKiDAWWRAANYISAAQMYLKDNPLLPJffi^^ 81 

L ++ +WRAANY++ +YL+DNPLLR L +K +GHWG+ PG +F+Y HLNR 
Sbjct: 44 IJNQ^lHGFWRAA]SreIlM?GMIYLRDNPLLREPLQPEQIKHRLLGHWG3SPGISFL'YTHIMlI 103 

Query: 82 INKYDLDMFYIEGPGHGGQVMVSNSYLDGSYTELNPNIEQTEDGFKQLCKIFSFPCGIAS 141 

I K+D DM Y+ GPGHG + YL+GSY+ + EDG K+ K FSFP GI S 

Sbj ct : 104 IRKFDQDMLYMVGPGHGAPGFLGPCYLEGSYSRFFAECSEDEDGMKRFFKQFSFPGGIGS 163 

Query: 142 HAAPETPGSIHEGGELGYALSHATCAILDNPDVIAATVIGDGEGETGPLMAGWLSNTFIN 201 

H PETPGSIHEGGELGY LSHA GA DNP++I + GDGE ETGPL W SN FIN 
Sbjct: 164 HCTPETPGSIHEGGELGYCLSHAYGAAFDNPNLIWGLAGDGESETGPIATSVIHSNKFIN 223 

Query: 202 PVMDGAVLPIFYLNGGKIHNPTIFERKTDEELSQFFEGLGWKPIFADWELSEDHAAAHA 261 

P+ DGAVLP+ +LNG KI+NP++ R + EEL FEG G+ P F + D + H 

Sbjct: 224 PIRDGAVLPVLHLNGYKINNPSvT,SRISHEELKKLFEGYGYTPYFVE GSDPESMHQ 279 

Query: 262 LFAEKLDQAIQEHCTIQSEARQKPAEEAIQAKFPVLVARIPKGWTGPKAWEGTPIEGGFR 321 

A LD + EI IQ EAR A++ ++P+4V R PKGWTGP +G +EG +R 

Sbjct: 280 AMAATLDHCVSEIHQIQQEARSTGI - -AWPRWPMVVMRTPKGWTGPDYVDGHKVEGFWR 337 

Query: 322 AHQVPIPVDAHHMEHVDSLLSWLQSYRPEELFDENGKIVDEIAAISPKGDRRMSMNPITN 381 

+HQVP+ + H+ L +W++SY+PEELFDE G + AI+P+GD+R+ P N 

Sbjct: 338 SHQVPMGGMHENPAHLQQLEAWMRSYKPEELFDEQGTLKPGFKAIAPEGDKRLGSTPYAN 397 

Query: 382 AGIV-KAMDTADWKKFALDINVPGQIMAQDMrEFGKYAADLVDANPDNFRIFGPDETKSN 440 

G++ + + D++++ +D++ PG I A + G + D++ N NFR+FGPDE SN 
Sbjct: 398 GGLLRRGLKMPDFRQYGIDvTDQPGTIFAPOTAPLGVFLRDVMANNMTNFRLFGPDENSSN 457 

Query: 441 RLQEVFTRTSRQWLGRRKPDYDEA- - LSPAGRVIDSQLSEHQAEGFLEGYVLTGRHGFFA 498 

+ L v+ + +, W+ + + LSP GRV++ LSEH EG+LE Y+LTGRHGFFA 

Sbjct: 458 KLHAWEVSKKFWIAEYLEEDQDGGELSPDGRVMB-MLSEHTLEGWLEAYLLTGRHGFFA 516 

Query: 499 SYESF]^WDS^OTQHFKOT^RKSKTHTTWRK^^^PALNLIAASTVFQQDHNGYTHQDPGIL 558 

+YESF V+ SMV QH KWL + H WR + +LN++ STV++QDHNG+THQDPG L 
Sbjct: 517 TYES FAHVI TSMVNQHAKWLDI CR - HLNWRADISSLNILMTSTVWRQDHNGFTHQDPGFL 575 

Query: 559 THLAEKTPEYIREYLPADTNSLLAVMDKAFKAEDKINLIVTSKHPRPQFYSIAEAEELVA 518 

+ K+P+ +R YLP D NSLL+V D ++++ IN+IV K Q+ + A 
Sbjct: 576 DVILNKSPDVTOIYLPPDVKSLLSVADHCLQSKNYINIIVCDKQAHLQYQDMTSAIRNCT 635 

Query: 619 EGYKVIDWASWSLNQEPDWFAAAGTEPNLEALAAISILHKAFPELKIRFVNVLDILKL 678 

+ G + +WASN EPDW AAAG P EALAA ++L + FP L+IRFV+V+D+LKL 

Sbjct: 636 KGVDIWEWASN -DAGTEPD vWiAAAGDI PTKEALAATAMLRQFFPNLRI RFVSVI DLLKL 694 

Query: 679 RHPSQDARGLSDEEFNKVFTTDKPVIFAFHGYEDMIRDIFFSRHNH-NLHTHGYRENGDI 737 

+ S+ GLSD +F+ +FTTDKP+IF FH Y +1 + + R NH NLH GY+E G+I 
Sbjct: 695 QPESEHPHGLSDRDFDSLFTTDKPIIFNFHAYPWLIHRLTYRRTNHGNLHVRGYKEKGNI 754 

Query: 738 TTPFDMRVMSELDRFHLAQDA- -ALASLGNKAQAFSDEriNQMVAYHKDYIREHGDDIPEV 795 

TP D+ + +++DRF IAD LI> ++M +Y EHG D+PE+ 

Sbjct: 755 NTPMDLAIQNQIDRFSLAIDVIDRLPQLRVAGAHIKEMLKDMQIDCTNYAYEHGIDMPEI 814 

Query: 796 QNWKW 800 
NW+W 

Sbjct: 815 VNWRW 819 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 
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-1663- 

Examplel506 

A DNA sequence (GBSxl593) was identified in S.agalactiae <SEQ ID 463 1> which encodes the amino 
acid sequence <SEQ ID 4632>. Analysis of this protein sequence reveals the following: 

Possible site: 32 
5 »> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 3509 (Affirmative) < suco 

bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 

10 bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database. 



Query: 5 LEVKNLTKI FGKKQKAALEMVKQGKSKTE I LEKTGATVGVYDASFE I KEGE I FVIMGLSG 64 

+++++LTKIFGK+ K AL MV++G+ K EIL+KTGATVGVYD +FEI EGEIFVIMGLSG 
Sbjct: 5 IKIEHLTKIFGKRIKTALT^KGEPKSEILKICrGATVGVYDTNFEINEGEIFVIMGLSG 64 

Query: 65 SGKSTLVRMUffiLIDPSSGNIYLDGKDIAKMNVEDLIOTIRRHDINMVFQNFGLFPHRTIL 124 

SGKSTL+R+LNRLI+P+SG I++D +D+A +N EDL +RR ++MVFQNFGLFPHRTIL 
Sbjct: 65 SGKSTLLRLLNRLIEPTSGKIFIDNQDVATRIKEDLLQVRRKTMSMVFQNFGLFPHRTIL 124 

Query: 12S ENTEFGLEMRGVSKEERTTLAEKALDNAGLLPFKDQYPSQLSGGMQQRVGLARALANSPK 184 

ENTE+GLE++ V KEER AEKALDNA LL FKDQYP QLSGGMQQRVGLARALAN P+ 
Sbjct: 125 EOTEYGLEVQWPKEERRKRAEKALDNANLLDFKDQYPKQLSGGMQQRVGLARAIA1TOPE 184 

Query: 185 ILLMDEAFSALDPLIRREMQDELLDLQDTNKQTIIFISHDLNEALRIGDRIALMKDGEIM 244 

ILLMDEAFSALDPLIRREMQDELL+LQ ++TIIF+SHDLNEALRIGDRIA+MKDG+IM 
Sbjct: 185 ILLMDFAFSALDPLIRREMQDEIjLELQAKFQICTIIFVSHDLNEALRIGDRIAIMKDGKIM 244 

Query: 245 QIGTGEEILTNPANDFVKEFVEDVDRSKVLTAQNIMIKPLTTVLEIDGPQVALTRMHREE 304 

QIGTGEEILTNPAND+V+ FVEDVDR+KV+TA+NIMI LTT +++DGP VAL +M EE 
Sbjct: 245 QIGTGEEILTOPANDWKTFVEDVDRAKVITAENIMIPALTraiDVDGPSVALKKMKTEE 304 

Query: 305 VSMLMATNRRRQLLGSLTADAAIFJUIKKDLPLSEVIDKDWTVSKDTVITDIMPLIYDSS 364 

VS LMA +++RQ G +T++ AI ARK + PL +V+ DV TVSK+ ++ DI+P+IYD+ 
Sbjct: 305 VSSLMATOKKRQFRGVVTSEQAIAARKNNQPI^WTDVGWSKEMLVRDILPIIYDAP 364 

Query: 365 APIAVTDDNDRLLGVT IRGRVIEALANVQDETWESPKETVE 406 

P+AV DDN L GV+IRG V+EALA++ DE VE ++ E 
Sbjct: 365 TPLAWDDNGFLKGVLIRGSVLEAIiADI PDEDEVEEIEKEEE 406 

A related DNA sequence was identified in S.pyogenes <SEQ ID 4633> which encodes the amino acid 
sequence <SEQ ID 4634>. Analysis of this protein sequence reveals the following: 



o N-terminal signal sequence 



Final Results 

50 bacterial cytoplasm Certainty=0 .3761 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

55 Identities = 344/395 (87%) , Positives = 374/395 (94%) 

Query: 1 MTNI LEVKNLTKI FGKKQKAALEMVKQGKSKTE ILEKTGAWGVYDASFE I KEGEIFVIM 60 

M ILEVK+L+KIFGKKQKAALEMVK GK+K+EI +KTGATVGVYDASFE+K+GEIFVIM 
Sbjct: 1 METILEVKHLSKIFGKKQKAALEMIHCTGKNKSEIFKKTGAWGVYDASFEVKKGEIFVIM 60 

60 

Query: 61 GLSGSGKSTLVRMIjNRLIDPSSGNIYLDGKDIAKtlNVEDLRNIRRHDINMVFQNFGLFPH 120 
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Query: 121 RTILEKTEFGLEMRGVSKEERTTLAEKALDNAGLLPFIvDQYFSQLSGGMQQRVGLARALA 180 

+TILENTEFGLE+RGV KEER LAEKALDN+GLL FKDQYP+QLSGGMQQRVGIARALA 
Sbjct: 121 KTILEOTEFGLELRGVPKEERQRLAEKALDNSGLLDFXDQYPKQLSGGMQQRVGLARALA 180 

Query: 181 NSPKILLMDEAFSALDPLIRREMQDELLDLQDTKKQTIIFISHDLNEALRIGDRIALMKD 240 

NSPKILLMDEAFSALDPL1RREMQDELLDLQD+ KQTIIFISHDLNEALRIGDRIALMKD 
Sbjct: 181 NSPKILLMDEAFSALDPLIRREMQDELLDLQDSMKQT I IFISHDIiNEALRIGDRIKLMKD 240 

Query: 241 GEIMQIGTGEEILTNPANDFTOEFVEETORSK\?L™AQNIMIKPLTTVLEIDGPQVALTRM 300 

G+IMQIGTGEEIL1NPANDFWEFVEDVDRSKVLTAQNIMIKPLTT +E+DGPQVAL RM 
Sbjct: 241 GQIMQIGTGEEILTNPANDFVREFVEDVDRSKVLTAQNIMIKPLTTTVELDGPQVAMIRM 300 

Query: 301 HREEVSMLMATNRRRQLLGSLTADAAIEARKKDLPLSEVIDKDWTVSKDTVITDIMPLI 3 SO 

H EEVSMLMATNRRRQL+GSLTADAAIEARKK LPLSEVID+DV TVSKDT+ITDI+PLI 
Sbjct: 301 HNEEVSMLMATNRRRQLVGSLTADAAIEARKKGLPLSEVIDRDVRTVSKDTIITDILPLI 3S0 

Query: 361 YDSSAPIAVTDDNDRLLGVI1RGRVIEALANVQDE 395 

YDSSAPIAVTDDN+RLLGVI IRGRVIEALAN+ DE 
Sbjct: 361 YDSSAPIAVTDDNNRLLGVI IRGRVIEALANISDE 395 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 



Example 1507 

A DNA sequence (GBSxl594) was identified in S.agalactiae <SEQ ID 4635> which encodes the amino 
acid sequence <SEQ ID 4636>. This protein is predicted to be OpuABC (opuAB). Analysis of this protein 
sequence reveals the following: 
Possible site: 41 

J-terminal signal sequence 



INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 



Likelihood =-10 
Likelihood = -9 
Likelihood = -7 
Likelihood = -6. 
Likelihood = -5 
Likelihood = -0 



53 



■ Final Results 

bacterial 
bacterial outside 
bacterial cytoplasm 



48 - 64 ( 43 - 

101 - 117 ( 93 - 122! 

296 - 312 ( 290 - 316) 

252 - 268 ( 250 - 273 

141 - 157 ( 138 - 170) 

220 - 236 ( 220 - 237 



■— Certainty=0. 5267 (Affirmative! 
— Certainty=0. 0000 (Not Clear) ■ 
•-- Certainty=0 . 0000 (Not Clear) ■ 



The protein has homology with the following sequences in the GENPEPT database. 



Query: 1 MENLLQHKLPVAPFVESTTNWITKTFSGLFDFIQTIGNALMDWMTKTLLFINPLLFIVLI 60 

M +L ++P+A +V S T+WIT TFS FD IQ G LM+ +T L + L I ++ 
Sbjct: 1 MIDLAIGQVPIANWVSSATDWITSTFSSGFDVIQKSGTVLMNGITGALTAVPFWLMIAW 60 

Query: 61 TIAVFFLAKKKWQLPTFTFIGLLFIYNQC-LWEQLINTFNLVLVASLISIIIGVPLGIWMA 120 

TI ++ KK P FTFIGL I NQGLW L++T LVL++SL+SI I IGVPLGIWMA 
Sbjct: 61 TIIAILVSGKKIAFPLFTFIGLSLIANQGLWSDLMSTITLVLLSSLLSIIIGVPLGIWMA 120 

Query: 121 KSDKVKQVVNPILDFMQTMPAFVYLIPAVAFFGIGWPGWASWFALPPTVRFTNLAIR 180 

KSD V ++V PILDFMQTMP FVYLIPAVAFFGIG+VPGVFASV+FALPPTVR TNL IR 
Sbjct: 121 KSDLVAKIVQPILDFMQTMPGFVYLIPAVAFFGIGWPGVFASVIFALPPTVRMTNLGIR 180 



60 Query: 181 EIPLELIFASDSFGSTWQKLFKVELPLAKNTIMAGINQTMMLALSMVVTGSMIGAPGLG 240 

++ EL+EA+DSFGST +QKLFK+E PLAK TIMAG+NQT+MLALSMW SMIGAPGLG 
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Sbjct: 


181 




Query: 


241 


5 


Sbjct: 


241 




Query: 


298 


10 


Sbjct: 


300 




Query: 


358 




Sbjct: 


356 


15 


Query: 


418 




Sbjct: 


416 


20 




478 




Sbjct: 


476 






538 


25 


Sbjct: 


536 



181 QVSTELVEAADSFGSTARQKLFKL3FPIAKGTIMAGVNQTIMIALSMWIASMIGAPGLG 240 



M+ G A +KV + Y+ WDSEVAS NV+ + +K G+DV+ T 



LDNAV WQTVANG AD SAWLP TH 



NSIE+L+NQA+K ITGIEPGAG+M + 



D LGP+++ K+G WP YMNV 



Y NL WKL+ +S+GAMT LG+AIK 



+VITGWS PHWMP KYDLKYL DPK + G E+INTI RK LKK+ P+ YK++DKF W 



T +DME++MLD+ G P +AA+ WIK+H+KEV +W K 



A related DNA sequence was identified in S.pyogenes <SEQ ID 4637> which encodes the amino acid 
sequence <SEQ ID 4638>. Analysis of this protein sequence reveals the following: 



Possible site: 47 
>> Seems to have no N- terminal 
INTEGRAL Likelihood 
INTEGRAL Likelihood 
, INTEGRAL Likelihood 
INTEGRAL Likelihood 
INTEGRAL Likelihood 
INTEGRAL Likelihood 



jnal sequence 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 
Trar smembrane 
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252 - 268 



141 - 157 

295 - 311 



250 - 273 



289 - 315 
220 - 237 



■ Final Results 

bacterial membrane - 

bacterial outside - 

bacterial cytoplasm - 



■ Certainty=0. 4545 (Affirmative) . 
• Certainty=0 . 0000 (Not Clear) < i 
- Certainty=0. 0000 (Not Clear) < i 



The protein has homology with the following sequences in the databases: 



Query: 8 KLPVAQLVEQLTEWLTKTFSGLFDIMQWGSFLMDKMTKTLLFIHPLLFIVLVTAGMFFL 67 

++P+A V T+W+T TFS FD++Q G+ LM+ +T L + L I +VT + 
Sbjct: 8 QVPIANWVSSATDWITSTFSSGFDVIQKSGTVLMNGITGALTAVPFWLMIAVVTILAILV 67 

Query: 68 AKKKWPLPTFTLLGLLFIYNQGLmQLMNTFTLVLVASLISVLIGIPLGIWMAKNATVRQ 127 

+ KK P FT +GL I NQGLW LM+T TLVL++SL+S++IG+PLGIWMAK+ V + 
Sbjct: 68 SGKKIAFPLFTFIGLSLIANQGLWSDLMSTITLVLLSSLLSIIIGVPLGIWMAKSDLVAK 127 

Query: 128 IvNPILDFMQTMPAFVYLIPAVAFFGIGMVPGVFASVIFALPPTVRFTNLAIRDIPTELI 187 

IV PILDFMQTMP FVYLIPAVAFFGIG+VPGVFASVIFALPPTVR TNL IR + TEL+ 
Sbjct: 128 IVQPILDFMQTMPGFVYLIPAVAFFGIGWPGVFASVIFALPPTVRMTNLGIRQVSTELV 187 

Query: 188 FASDAFGSTGKQKLFKVELPIAHWIMAGVNC/T^^ 247 

EA+D+FGST +QKLFK+E PLAK TIMAGVNQT+MLALSMW SMIGAPGLGR VL+A+ 
Sbjct: 188 EAADS FGSTARQKLFKLEFPLAKGT I MAGVNQT I MLALSMWI ASMIGAPGLGRGVLAAV 247 

Query: 248 QHADIGSGFVSGLALVILAIVLDRMTQLFNSKPQEKAKAGKTNKW IGLAALAVFLIA 304 

Q ADIG GFVSG++LVIIAI++DR TQ N P EK KW I L +L +1 

Sbjct: 248 QSADIGKGFVSGISLVILAI I IDRFTQKMIVSPLEKQGNPTVKKWKRGIALVSLLALI IG 307 
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Query: 305 ALGRGIMAMTSGMADKGETVNIAY^/QTOSEVASTWIAE^/LH^GYHVTLTPLDNAVMWQ 364 

A M+ G + V++ Y+ WDSEVAS +V+ + +K G+ V T LDNAV WQ 

Sbjct: 308 AFS GMSFGKTASDKKOTLTO-INWDSEVASIK\ r LTQAlt<EHGFDVKTTALDNAVAWQ 363 

Query: 365 OTANGNADFSTSAWLPVTHGQQYQICYKSKLDDLGPNLKGTKLGLAVPKYMTDVNSIEDLS 424 

TVANG AD SAWLP TH Q+QKY +D LGPNIiKG K+G VP YM +WSIEDL+ 
Sbjct: 364 TVANGQADGMVSAWLPMTHKTQWQKYGKSVDLIfiPNLKGAKV'GFWPSYM-IS^SIEDLT 422 

Query: 425 KQADQKITGIEPGAGIMAAAQKTLKEYHNLSSWELVAASTGAMTTSLDQAIKKKDPIWT 484 

QA++ ITGIEPGAG+MAA++KTL Y NL W4LV +S+GAMT +L +AIK+ IV+T 
Sbjct: 423 NQANKTITGIEPGAGVMAASEKTLNSYDNLKDWKLVPSSSGAMTVALGEAIKQHKDIVIT 482 

Query: 

Sbjct: 

Query: 545 VMLDINKGMSPEAAAKKWVEANKSKVSSWTK 575 

VMLDI G +PE ARK W++ ++ +V W K 
Sbjct: 543 VMLD I QNGKTPEEAAKNWI KDHQKEVDKWFK 573 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 439/576 (76%) , Positives = 513/576 (88%) , Gaps = 2/576 (0%) 

MENLLQHKLPVAPFVESTTOTITKTFSGLFDFIQTIGNALMDWMTKTLLFINPIjLFIVLI 6 0 
+E +LQ KLPVA VE T W+TKTFSGLFD +Q +G+ LMDWMTKTLLFI+PLLFIVL+ 
LETILQTKLPVAQLVEQLTEWLTKTFSGLFDIMQWGSFLMDWMTKTLLFIHPLLFIVLV 60 

TIAVFFIAKZKWQLPTFTFIGLIiFIYNQGLWEQLIOTFmVLVASLISIIIGVPLGIWMA 120 
T +FFLAKKKW IiPTFT +GLLFIYNQGLW+QL+NTF LVLVASLIS++IG+PLGIWMA 



Query: 


1 


Sbjct: 




Query: 


61 


Sbjct: 


61 




121 


Sbjct: 


121 




181 


Sbjct: 


181 


Query. 


241 


Sbjct: 


241 


Query: 


301 


Sbjct: 


300 


Query: 


361 


Sbjct: 


360 




420 


Sbjct: 


420 


Query: 


480 


Sbjct: 


480 


Sbjct: 


540 
540 



+ IP ELIEASD+FGST KQKLFKVELPLAKNTIMAG+KQTMMLALSMVVTGSMIGAPGLG 



REVLSALQHADIG+GFVSGL+LVILAIVLDR++Q FNSKP EK AK K KW+GL ALA 



+F++AALGR ++ MTSG KG+ V IAYVQWDSEVAST+VIAEVLK++GY V LTPLDN 



AVMWQTVANGNADF+TSAWLP THGQ + KYK+ LDDLGP+++ K+GL VPKYM +VNS 



IE+LS QAD++ITGIEPGAGIM +A+++LK+Y NLSSM+L++ASTGAMTT+L 4 



65 A related GBS gene <SEQ ID 8827> and protein <SEQ ID 8828> were also identified. Analysis of this 
protein sequence reveals the following: 
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Lipop: Possible site: -1 Crend: 7 
McG: Discrim Score: -S.57 
GvH: Signal Score (-7.5): -5.37 

Possible site: 41 
i» Seems to have no N-terminal signal sequence 
ALOM program count: 6 value: -10.67 threshold: 0.0 

INTEGRAL Likelihood =-10.67 Transmembrane 48 - 64 ( 43 - 72) 
3.24 Transmembrane 101 - 117 ( 93 - 122) 
Transmembrane 296 - 312 ( 290 - 316) 
Transmembrane 252 - 268 ( 250 - 273) 
Transmembrane 141 - 157 ( 138 - 170) 
Transmembrane 220 - 236 ( 220 - 237) 



INTEGRAL Likelihood = 

INTEGRAL Likelihood = 

INTEGRAL Likelihood = 

INTEGRAL Likelihood = 

INTEGRAL Likelihood = 



*■ Reasoning Step: 



Final Results 

bacterial membrane Certainty=0 . 5267 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

ORF00938(322 - 2025 of 2325) 

GP|718880l|gb|AAF37879.l[AF234619_2|AF234619(8 - 573 of 573) OpuABC {Lactococcus lactis} 
%Matoh = 44.7 

^-Identity =60.2 %Similarity =75.7 

Matches = 342 Mismatches = 136 Conservative Sub.s = 88 

255 285 315 345 375 405 435 465 

: = I = I =1 I h I I I I I I I II I I I I = = I I = I 
MIDLAIGQVPIANWVSSATDWITSTFSSGFDVIQKSGTVLMNGITGALTAVPFWL 



FIVLITIAVFFIAKiaWQLPTFTFIGLLFIYNQGLWEQLIOTFNLVLVASLISIIIGVPLGII'IMAKSDKVKQVVNPILDF 
I :s|| = = = II =1 I I I I I I s| lllli h = l I I I = = I I = I I I I I I I II I I I 1 I I I I -I lllll 

MIAWriLAILVSGKKIAFPLFTFIGLSLIANQGLWSDLMSTITLVLLSSLLSIIIGVPLGIWMAKSDLVAKIVQPILDF 



MQTMPAFVYLIPAVAFFGIGMVPGVFASWFALPPTVRFTNIAIREIPLELIEASDSFGSTVKQ 

iiiii ii ii iiniiiiii:ii! 1 1111 = 11 hum in u = = ihihiinii =iiui = i:i!ii mi 

MQTMPGF\ryLIPAVAFFGIGWPGVFASVIFALPPTVRMTl\ T LGIRQVSTELVEAADSFGSTARQKLFKLEFPIjAKGTI^ 
150 160 170 180 190 200 210 

975 1005 1035 1065 1095 1125 1155 1185 

GINQTMMLALSMVOTGSMIGAPGLGREVLSALQHADIGTGFVSGLSLVII^ 

1 = 111 = 11111111 I lllll Ml II 1 = 1 = 1 Mil 11111 = 1 1111)1 = = ] I =1 =1 I 111 MM ' 

GWQTIMLALS^nWIASMIGAPGLGRGVIAAVQSADIGKGFVSGISLVI^IIIDRFTQKIOTSPLEKQG-NPTVKKW-K 
230 240 250 260 270 2S0 290 

1215 1245 1275 1305 1335 1365 1395 1425 

LGAIALFIIiAALGRIVVTWlTSGfflFiAKGQKVKIAYVQWDS 

I = =11 = 1= I I =11 = 1= llllll! 11= = =1 1 = 11= I lllll Mill' II 

RGIALVSLLALIIGAFSGMSFGKTASDKKVDLVYMNWDSEVASIN^ 

310 320 330 340 350 360 370 

1455 1485 1515 1545 1575 1605 1635 1665 

TSAWLPKTHGQYFNKYKNSLDDLGPHVENVKIGLWPKYMNWSIEELSNQADKQITGIEPGAGIMI^AKQSLKDYPNLS 

lllll II = II 1=1 111=== 1=1=111 11111111=1=111=1 111111111=1 =====1 I II 
VSAWLPNTHKTQWQKXGKSVDLLGPNLKGAKyGFWPSYMKW 

390 400 410 420 430 440 450 
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III: >|:|||l 11 = 111 :|llllllllll lllllll III = 1 1 = 1111 II 111= 1= 1 I = = I I I 

DWKLVPSSSGAMTVALGFAIKQHKDIVITGWSPKWMFNKYDLKY 

470 480 490 500 510 520 530 

1935 1965 1995 2025 2055 2085 2115 2145 

KWKEDMES IMLDMDKG^PAKAAQKWI KMJKKm^^ 

II :|||:.-Ilh I I =11= ll|:|s|M =1 I 
l^TKDMEAVMLDIQNSKTPEEflAKNWIKDHQKEVDKWFK 
550 560 570 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1508 

A DNA sequence (GBSxl596) was identified in S.agalactiae <SEQ ID 4639> which encodes the amino 
acid sequence <SEQ ID 4640>. This protein is predicted to be a transposase. Analysis of this protein 
sequence reveals the following: 

Possible site: 45 

>» Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -1.65 Transmembrane 223 - 239 ( 223 - 240) 



Final Results 

bacterial membrane --- Certainty=0 . 1659 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm — Certalnty=0.0000(Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 10057> which encodes amino acid sequence <SEQ ID 
10058> was also identified. A related GBS nucleic acid sequence <SEQ ID 10031> which encodes amino 
acid sequence <SEQ ID 10032> was also identified. A related GBS nucleic acid sequence <SEQ ID 10801> 
which encodes amino acid sequence <SEQ ID 10802> was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 



Query: 8 KHKHLTLLDRNDIQSGLDRGETFKAIGLNLLKHPTTIAKEVKRN- - KQLRESTKDCLDCP 65 

K+KHL + +R ++ L G + L + T+ E+-+R KQ+4-+ 4- + 

Sbjct: 12 KNKHLNMKERMIVEIRLia3GFSAYKOTKEI^PIima^IRRGTTKQIKCi3KEFHVYFA 71 

Query: 66 LLRKAPWCKGCPKRRINCGyKKTFYLAKQAQRNyEKLLVESREGIPLNKETFWKIDRVL 125 

+A Y M + + N YK +4 K +V+ K W +D + 

Sbjct: 72 DTGEAVYKKN RLKSNRKYKLL ECSDFIKYWDKV KNDHWSLDACV 115 

Query: 126 SNGVKKGQRIYHILKTNDLEVSSSTVYRHIKKGYLS1TPIDLPRAVKFKKRRECSTLPPIP 185 

G+ ++ + +S+ T+Y ++ G L I IDLP K + +KST 
Sbjct: 117 GEALHSSRFSPSQI1STKTLYNYVDLGLLPIKNIDLP--AKLHRNKKSTRVRNN 168 

Query: 186 KAIKEGRRYEDFIEHM-NQSELNSl'7LSMDTVIGRIGGK--VLLTFNVAFCNFIFAKLMDS 242 

K KG D + N+ E W E+D V+G K VLLT + MS 

Sbjct: 169 KK-KLGTSISDRPNSIEKREEFGHW-EIDCVLGEKSMKDKVLLTLVERKTRYAIISEMSS 226 

Query: 243 KTAIETAKHIQVIKRTLYDNKRDFFELFPVILTDNGGEFARVDDIE1DVCGQSQLFFCDP 302 

+ 1 K 4- IK L F E+F I DNG EFA + + E+ +++++F P 

Sbjct: 227 HSTISVTKALDKIKEFLGSK FSEVFKSITADNGSEFADLSEFELKT--KTKVYFTHP 281 

Query: 303 ]^SDQIQ^IEKNHTLVFJ3ILPKGTSFDNLTQEDIN1ALSHINSVKRQAM 362 

S +K E+++ L+R +PKG + + E 1+ + +N++ R+ L+ KT ELF 
Sbjct: 282 YSSFEKGTHERHNGLIRRFIPKGKRISDYSLETISFIENM1NTLPRKLLDYKTPEELFEI 341 

Query: 363 TYGK 366 
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-1669- 

K 

Sbjct: 342 HLDK 345 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1509 

A DNA sequence (GBSxl597) was identified in S.agalactiae <SEQ ID 4641> which encodes the amino 
acid sequence <SEQ ID 4642>. Analysis of this protein sequence reveals the following: 



Possible site: 33 





have an uncleavable N-term signal seq 










INTEGRAL 


Likelihood =-11.30 


Transmembrane 


56 


72 


48 


- 79 


INTEGRAL 


Likelihood = -6.85 


Transmembrane 


11 


27 


6 


- 30 


INTEGRAL 


Likelihood = -6.69 


Transmembrane 


129 


145 


126 


- 158 


INTEGRAL 


Likelihood = -6.53 


Transmembrane 


94 


110 


90 


- 117 


INTEGRAL 


Likelihood = -1.54 


Transmembrane 


216 


232 


215 


- 232 


INTEGRAL 


Likelihood = -1.22 


Transmembrane 


147 


163 


147 


- 165 



Final Results 

bacterial membrane Certainty=0 . 5522 (Affirmative) < succ: 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Cer taint y=0 . 0000 (Not Clear) < suco 



A related GBS nucleic acid sequence <SEQ ID 943 1> which encodes amino acid sequence <SEQ ID 9432> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:EAB07666 GB:AP001520 unknown conserved protein [Bacillus halodurans] 
Identities = 112/224 (50%), Positives = 150/224 (66%), Gaps = 2/224 (0%) 

Query: 8 IKDILWFIIPSLFGVLLLMTPFKYNGMTTVAVSVISKTINQWINAVFPIHYIILLIIFIS 67 

+KD LWF+IPS+ GV L M P + + T+ V+ ++K + ++ P I+L I + 
Sbjct: 19 LKTJYLWFLIPSIIGVGLFWPIQKIJI^ITIPVAFLAKQLQGALDDHLPAILTIMLAIvV- 77 

Query: 68 CVLALCYRLFRPSFIEKNDLLKEISDITIFWLIIRLIGIiALGMTVLHIGPEMVWGKETG 127 

VL+ LF+P+ KN LLK + I WL++R++G MT+L +GPE VW + TG 

Sbjct: 78 -VLSCVATLFKPNLFMKNGLLKSLFVIHPI^VTOVLGFIFAFMTLLQLGPFJWWSEGTG 136 

Query: 128 GLILFDLIGGLFTIFLAAGFILPFLTEFGLLEFVGVFLTPIMRPFFQLPGRSAVNCVASF 187 

L+L+DL+ LFTIFL AG LPFL FGLLE GV L MRP F LPGRS+++C+AS+ 
Sbjct: 137 ALLLYDLLPLLFTIFLFAGLFLPFLLNFGLLELFGVLLNKFMRPVFTLPGRSSIDCLASW 196 

Query: 188 VGDGTIGIALTDKQYVEGYYTSREAATISTTFSAVSITFCLXXL 231 

+GDGTIG+ LT+KQY EG+YT REAA ISTTFS VSITF + L 
Sbjct: 197 MGDGTIGVLLTNKQYEEGFYTQREAAVISTTFSWSITFSIVVL 240 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1510 

A DNA sequence (GBSxl599) was identified in S.agalactiae <SEQ ID 4643> which encodes the amino 
acid sequence <SEQ ID 4644>. This protein is predicted to be Na/H antiporter homolog (kefB). Analysis of 
this protein sequence reveals the following: 
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Possible site: 17 
=.» Seems to have an uneleavable 
INTEGRAL Likelihood . 
INTEGRAL Likelihood 

INTEGRAL Likelihood = -9.24 

INTEGRAL Likelihood = -7.17 

INTEGRAL Likelihood = -7.01 

INTEGRAL Likelihood = -6.53 

Likelihood = -5.79 

Likelihood = -5.52 

Likelihood = -< 

Likelihood = -: 
Likelihood = 

Likelihood = -2.66 



-term signal seq 
Transmembrane 1 
Transmembrane 3 
Transmembrane 



Transmembrane 



Transmembrane 214 - 

Transmembrane 260 - 

Transmembrane 287 - 

Transmembrane 113 - 

Transmembrane 332 - 



209 - 233 
258 - 278i 



112 - 1291 



- Final Results 

bacterial membrane Certainty=0 . 5055 (Affirmative) ■ 

- Certainty=0. 0000 (Not Clear) < s 

- Certainty=0. 0000 {Not Clear) < f 



The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAA51756 GB:X73329 Na/H antiporter homolog [Lactococcus lactis] 
Identities = 208/376 (55%), Positives = 285/376 (75%), Gaps = 3/376 (0%) 

Query: 1 MHIIIQITIILLASVLATLISKRIGIPAWGQLLVGIIIGPAMLGLVHQNQVLHVLSEIG 60 

M+ I+Q+TI+L+AS++ATL S+R+ IPAV+GQ+LVGI+I P++LGLVH VL V+SEIG 
Sbjct: 1 ^^ILQLTIVLIASLIATIASRRLKIPAVIGQMLVGILIAPSvLGLVHSGHVLEVMSEIG 60 

Query: 61 VILLMFLAGLEANFDLLKKYLKPSLLVAITGVIVPMALFYFLTRLFGFQINTAIFYGLVF 120 

' VILLMFLAGLE++ +LKK K S+LVAI GVTVP+ +F + FG+ ++T+ FYQ+VF 
Sbjct: 61 VILLMFLAGLESDLTVLKKNFKASMLVAIGGVrVPLIVFGLVAFSFGYGMSTSFFYGIVF 120 

Query: 121 AATSISITvEVLQEYNRVKTDTGAIILGAAVADDVIiAVLLLSVFIA--TNGSSSNIGLQI 178 
AATS+STTVEVLQEY ++ T G+IILGAAV DD+LAVL+LS+F + GS +++ Q 

Query: 179 IIQLLFFVFLFICMKYLVPALFKLIEKVHFFEKYTILAILICFSLSILADKVGMSSIIGS 238 

+++LLFF FLF+ K L+P +K ++K+ K TI+A++IC LS+LAD VGMS++IGS 
Sbjct: 181 LLELLFFAFLFWHK- DIPRFWKFVQKLPIANKNTIVALI I CLGLSLLADSVGMSAVIGS 23 9 

Query: 239 FFAGLAIGQTSF^/DKVEHKISLLSYTFFIPIFFASIALPLKFDGMMSHLHTILIFTALAV 298 

FFAGLAI QT K+E S + Y FIP+FF IA+ ++FD ++ H IL+FT LA+ 
Sb j ct : 240 FFAGLAiSQTEVSHKIEEYTSAIGYVI FI PVFFVLIAI SVQFDSLIHHPWI ILLFTLLAI 299 

Query: 299 LSKLIPGYFVGRGFNFSKLESLTIGGGMVSRGEI-IALIIVQVGLAAKIISSTTYSELVIVV 358 

L+K IP YFVG+ S ES4 IG GM+SRGEMALI + Q+GL + 11+ YSELVIV+ 
Sbjct: 300 LTKFIPAYFVGKSNKLSTGESMLIGTGMISRC-EMALIVAQIGLTSAIITDEVYSELVIVI 359 

Query: 359 ILSTIIAPFILKYSFK 374 

IL+T++APF++K K 
Sbjct: 360 ILATVLAPFLIKLVLK 375 



No corresponding DNA sequence was identified in S.pyogenes. 



Based on tl 
vaccines or 



3 analysis, it was predicted that this protein and its epitopes, could be useful antigens for 



Example 1511 

A DNA sequence (GBSxl600) was identified in S.agalactiae <SEQ ID 4645> which encodes the amino 
acid sequence <SEQ ID 4646>. Analysis of this protein sequence reveals the following: 

Possible site: 22 

>» Seems to have a cleavable N-term signal seq. 
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Final Results 

bacterial outside Certainty=0. 3000 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB14269 GB:Z99116 ypuA [Bacillus subtilis] 
Identities = 86/319 (26%) , Positives = 147/319 (45%) , Gaps - 34/319 (10%) 

IKKLLFAGLAFILFTIASPAYARSDVQKVIDETyVQPDYVLGYSIjNQEQRAQTLQLLNYD 62 
+KK+ LA + L P + +D + + V LG L++ + + L +N 

MKKIWIGMLAAAVLLLMVPKVSLADA- -AVGDVIV TLGADLSESDKQKVLDEMNVP 54 



+N+ T NI+ 



J AP +V+G +AL G+ + E + ++S + KQ+A +E 





3 


Sb j Ct : 


1 






Sbjct: 


55 




119 


Sbjct: 


109 


Query: 


178 


Sb j ct : 


169, 




238 


Sbjct: 


221 




298 


Sbjct: 


272 



AVTENQINLIVNFAVNLSQSNVIKNSDFTNTLNNLKDNIVSKAGSKFKNINVNFNANKAV 297 

+T++Q N +V S N +KN+D + D + KA K + + 

TLTDSQKNQLV SLFNKMKNADI - -DWGQVSDQL-DKAKDKITKFIESDEGKNFI 271 

ESGKGFLANIWQQIVNFFQ 316 
+ F +IW IV+ F+ 
QKVIDFFVSIWNAIVSIFK 290 

No corresponding DNA sequence was identified in S. pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1512 

A repeated DNA sequence (GBSxl602) was identified in S.agalactiae <SEQ ID 4647> which encodes the 
amino acid sequence <SEQ ID 4648>. Analysis of this protein sequence reveals the following: 

? N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 0603 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB15719 GB:Z99122 similar to hypothetical proteins [Bacillus subtilis] 
Identities = 76/138 (55%), Positives = 91/138 (65%), Gaps = 12/138 (8%) 

Query: 1 MKLKAVHHIAIIVSDYEKSKDFYVNKLGFEIIRENHRPERHDYKLDLRC-GDIELEIFG1I 59 

M LK++HHIAII SDYEKSK FYV+KLGF++I+E +R ER YKLDL G +E+F 
Sbjct: 1 MLLKS IHHIAI I CSDYEKSKAFYVHKLGFQVI QET YREERGS YKLDLSLNGS YVIELF - - 58 

Query: 60 RLDDPEYETPPQRIGRPNWPREACGLRHLAFYVPDVEAYKVELENLGIFVEPIRYDDYTG 119 

+ PP+R RP EA GLRHLAF V ++ EL GI EPIR D TG 
Sbjct: 59 SFPDPPERQTRP EA&GLRHLAFTVGSLDKAVQELHEKGIETEPIRTDPLTG 109 
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Query: 120 KKMTFFFDPDGLPLELHE 137 

K+ TFFFDPD LPLEL+E 
Sbjct: 110 KRFTFFFDPDQLPLELYE 127 

A related DNA sequence was identified in S.pyogenes <SEQ ID 4649> which encodes the amino acid 
sequence <SEQ ID 4650>. Analysis of this protein sequence reveals the following: 

N-terminal signal sequence 

10 Final Results 

bacterial cytoplasm Certainty=0 .1205 (Affirmative) < suco 

bacterial membrane Certainty=0 .0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

1 5 An alignment of the GAS and GBS proteins is shown below. 

Identities = 99/137 (72%) , Positives = 116/137 (84%) 

Query: 1 MKLKAVHHIAI IVSDYEKSKDFYVNKLGFEIIRENHRPERHDYKLDLRCGDIELEIFGNR 60 
MKL A+HH+AIIVSDY SKDFYVNKLGFEIIREN+RP++HDYKLDL CG IELEIFG 
20 Sbjct: 2 MKLNAIHHVAI1VSDYHLSKDFYVNKLGFEIIRENYRPDKHDYKLDLSCGRIELEIFGKV 61 

Query: 61 LDDPEYETPPQRIGRPNWPREACGLRHLAFYVPDVEAYKVEIiENLGIFVEPIRYDDYTGK 120 

DP Y+ PP+R+ P + EACGLRHLAF V ++E+Y +L++LGI VEPIR+DDYTG+ 
Sbjct: 62 TSDPNYQAPPKRVSEPEFKSEACGLRH1AFRVTNIESYVDDLKSLGIPVEPIRHDDYTGE 121 

25 

Query: 121 KMTFFFDPDGLPLELHE 137 

KMTFFFDPDGLPLELHE 
Sbjct: 122 KMTFFFDPDGLPLELHE 138 

30 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1513 

A DNA sequence (GBSxl603) was identified in S.agalactiae <SEQ ID 465 1> which encodes the amino 
acid sequence <SEQ ID 465 2>. This protein is predicted to be alpha-amylase. Analysis of this protein 
35 sequence reveals the following: 
Possible site: 40 

»> Seems to have an uncleavable N-term signal seg 

INTEGRAL Likelihood =-11.62 Transmembrane 14 - 30 ( 7 - 36) 

40 Final Results 

bacterial membrane Certainty=0 . 5649 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

45 The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAG41778 GB-.AF213261 sortase [Streptococcus gordonii] ' , 

Identities = 136/247 (55%) , Positives = 174/247 (70%) , Gaps = 2/247 (0%) 

Query: 2 RNIOaCSHGFFNFTOm.LVvLLIIVGl^VFl^IRNAFIAHQSNHYQISRVSKK.TIEKNK 61 
50 R KK N + +L V+L++V LAL+FN IRN + +N YQ+S+VSKK IEKNK 

Sbjct: 6 RRAKKKRSRF^IILNILSVILLLVTALALIFNSSIRl^IMvWHTNKYQVSKVSKlCEIEKNK 65 

Query: 62 KSKTSYDFSSVKSISTESILSAQTKSHNLPVIGGIAIPDVEINLPIFKGLGNTELSYGAG 121 
SK S++F V+ +STE++L+AQ K+ LPVIGGIAIP++ +NLPIF GL N L YGAG 
55 Sbjct: 66 ASKGSFNFEKVEPLSTEAVLNAQWKAQQLPVIGGIAIPELSLNLPIFNGLENAGLYYGAG 12 5 



Query: 



122 TMKEMQIMGGPNNYALASHHVFGLTGSSKMLFSPLEHAKKGMKVYLTDKSKVYTYTITEI 181 
TMKE Q M G NYALASHHVFG+TG+++MLFSPL+ AK GMK+YLTDK KVYTY+IT + 
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Sbjct: 126 TMKETQEM-GKGNYALASHHVFGITGANEMLFSPLDRAKAGMKIYLTDKEKVYTYSITSV 184 

Query: 182 SKVTPEHVEVIDD-TPGKSQLTLVTCTDPEATERIIVHAELEKTGEFSTADESILKAFSK 240 

V PE V+V+DD G 444TLVTC D AT R IV LE + + IL F+K 
Sbjct: 185 ENVEPERVDVVDDAMlGTAEVTLWCEDAAATSRTIWGVLESETPYKETPKKIIJStYFNK 244 

Query: 241 KYNQINL 247 

YNQ+ L 
Sbjct: 245 SYNQMQL 251 

A related DNA sequence was identified in S. pyogenes <SEQ ID 465 3> which encodes the amino acid 

sequence <SEQ ID 4654>. Analysis of this protein sequence reveals the following: 

Possible site: 34 
>>> Seems to have an uncleavable N-terra signal seq 

INTEGRAL Likelihood = -8.12 Transmembrane 18 - 34 ( 13 - 38) 
INTEGRAL Likelihood = -0.32 Transmembrane 94 - 110 ( 94 - 110) 

Final Results 

bacterial membrane Certainty=0 .4248 (Affirmative) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:AAA73122 GB:M77279 alpha-amylase [unidentified cloning vector] 
Identities - 60/122 (49%) , Positives = 85/122 (69%) 

Sbjct: 4 

Query: 68 TFDFQAVEPVSTESVLQAQMAAQQLPVIGGIAIPELGINLPIFKGLGNTELIYGAGTMKEE 127 

TFDF +VE +STE+V++AQ 4 LPVIG IAIP + INLPIFKGL N L+ GAGTMKE+ 
Sbjct: 65 TFDFDSVESLSTEAVMKAQFENKNLPVIGAIAIPSVEINLPIFKGLSNVALLTGAGTMKED 124 

An alignment of the GAS and GBS proteins is shown below. 

Identities - 147/245 (60%) , Positives = 192/245 (78%) 

Query: 2 RNKKKSHGFFNFVRWLLWLLIIVGLALVFNKPIRNAFIAHQSNHYQISRVSKKTIEKNK 61 

4 K44 44 R LL4 4L4 1 4GLAL4FNKPIRN IA SN YQ444VSKK I4KNK 

Sbjct: 4 KQKRRKIKSMSWARKLLIAVLLILGLALLFNKPIRNTLIARNSNKYQVTKVSKKQIKKNK 63 

Query: 62 KSKTSYDFSSVKSISTESILSAQTKSHNLPVIGGIAIPDVEINLPIFKGLGNTELSYGAG 121 

4+K4+4DF 4V4 +STES4L AQ 4 LPVIGGIAIP44 INLPIFKGLGNTEL YGAG 
Sbjct: 64 EAKSTFDFQAVEPVSTESVLQAQMAAQQLPVIGGIAIPELGINLPIFKGLGNTELIYGAG 123 

Query: 122 TMKENQIMGGPNNYAI^SEIHVFGLTGSSKMLFSPLEHAKKGMKVYLTDKSKVYTYTITEI 181 
TMKE Q4MGG NNY4LASHH4FG4TGSS4MLFSPLE A4 GM 4YLTDK K4Y Y I 44 
'' Sbjct: 124 TMKEEQVMGGENNYSLASEffllFGITGSSQMLFSPLEFAQNGMSIYLTDKEKIYEYIIKDV 183 

Query: 182 SKVTPEEIVEVIDDTPGKSQLTLVTCTDPEATERIIVHAELEKTGEFSTADESILKAFSKK 241 

V PE V4VIDDT G 44TLVTCTD EATERIIV EL4 4F A 4LKAF4 
Sbjct: 184 FTVAPERVDVI DDTAGLKEVTLVTCTD IEATERI IVKGELKTEYDFDKAPAD VXjKAFNHS 243 

Query: 242 YNQIN 246 

YNQ44 
Sbjct: 244 YNQVS 248 

SEQ ID 4652 (GBS266) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 49 (lane 11; MW 26kDa). 



60 GBS266-His was purified as shown in Figure 205, lane 10. 
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Based on this analysis, it was predicted that these proteins and their epitopes could be useful 
vaccines or diagnostics. 

Example 1514 

A DNA sequence (GBSxl604) was identified in S.agalactiae <SEQ ID 4655> which encodes the amino 
acid sequence <SEQ ID 4656>. Analysis of this protein sequence reveals the following: 

N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1934 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

A related DNA sequence was identified in S.pyogenes <SEQ ID 4657> which encodes the amino acid 
sequence <SEQ ID 4658>. Analysis of this protein sequence reveals the following: 

N-terminal signal sequence ' 

Final Results 

bacterial cytoplasm --- Certainty=0 . 1934 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 711/819 (86%) , Positives = 767/819 (92%) 

Query: 1 MQDK3JLVDVNLTSEMKTSFIDYAMSVIVARALPDVRDGLKPTORRILYGMNELGVTPDKP 60 

MQD+NL+DVNLTSEMKTSFIDYAMSVIVARALPDVRDGLKPVHRRILYGMNELGVTPDKP 
Sbjct: 1 MQDRNLIDVNLTSEMKTSFIDYAMSVIVARALPDVRDGLKPVHRRILYGMNELGVTPDKP 60 

Query: 61 HKKSARITGDVMGKYHPHGDSSIYEaMVRmQtWSYRHMLVDGHGNFGSMDGDGAAAQRY 120 

HKKSARITGDVMGKYHPHGDSSIYFJ«^TOO«WSYRHMLVDGHGNFGSMDGDGAftAQRY 
Sbjct: 61 HKKSARITGDVMGKYHPHGDSSIYiaMvRMAQ»'SYRHMLvDGHGNFGSMDGDGAAAQRY 120 

' Query: 121 TEARMSKIALEMLRDINKNTVDFQDNYDGSEREPLVLPARFPNLLVNGATGIAVGMATNI 180 
TEARMSKIALE+LRDINKNTV+FQDNYDGSEREP+VLPARFPNLLvNGATGIAVGMATNI 
Sbjct: 121 TEARMSKIALELLRDINKNTVNFQDNYDGSEREPVVLPARFPNLLVNGATGIAVGMATNI 180 

Query: 181 PPHNLGESIDAVKLVMDNPDVTTRELMEVIPGPDFPTGALVMGRSGIHRAYETGKGSIVL 240 

PPHNIj ESIDAVK+VM++PD TTRELMEVIPGPDFPTGALVMGRSGIHRAY+TGKGSIVL 
Sbjct: 181 PPHNLAESIDAVKMVMEHPDCTTRELMEVIPGPDFPTGALVMGRSGIHRAYDTGKGSIVL 240 

Query: 241 RSRTEIETTSNGKERIWTEFPYGVNKTKVHEHIVRLAQEKRIEGITAVRDESSREGVRF 300 

Sbjct: 241 RSRTEIETTQTGRERIWTEFPYGVNKTKVHEHIVRLAQEKRLEGITAVRDESSREGVRF 3 00 

Query: 301 VIEVRRAASANVILNNLFKLTSLQTNFSFTMLAIEKGVPKILSIjRQIIDNYIEHQKEVIV 360 

VIE+RR ASA VI JjNNLFKJjTSLQTNFS FNMLAIE GVPKILSLRQIIDNYI HQKEVI+ 
Sbjct: 301 VIEIRREaSATVILNNLFKLTSLQTNFSFNMLAIENGVPKILSLRQIIDNYISHQKEVII 360 

Query: 361 RRTQFDECAKAGARAHILEGLLVALDHLD3VITIIRNSE7DTIAQAELMSRFELSERQSQA 420 

RRT+FDK KA ARAHILEGIiL+ALDHLDEVT IIRNSETD IAQ ELMSRF+LSERQSQA 
Sbjct: 361 RRTRFDKDKAEARAHILEGLLIALDHLDEVIAIIRNSETDVIAQTELMSRFDLSERQSQA 420 

Query: 421 ILDMRLRRLTGLERDKIQSEYNDIjLALIADLADIIiAKPERVVTIIKEEMDEVKRKYADAR 480 

ILDMRI1RRLTGLERDKIQSEY+DLLALIADL+DILAKPER++TIIKEEMDE+KRKYA+ R 
Sbjct: 421 ILDMRLRRLTGLERDKIQSEYDDLLALIADLSDILAKPERIITIIKEEMDEIKRKYANPR 480 
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Sbjct: 481 RTELMVGEVLSLED3DLIEEED\TL:TLSNKGYIKRLAQDEFRAQKRGGRGVQGTGV1WDD 540 

Query: 541 FTOELVSTSTHDTVLFFTNLGRVYRLKAYEIFEYGRTAKGLPIVNLLKLDEGETIQTIIN 600 

FVREL+STSTHDT+LFFTN GRVYRLKAYEIPEYGRTAKGLPIVNLLKL++GETIQTIIN 
Sbjct: 541 FVRBLISTSTHDTLLFFTNFGRVYRLKA.YEIPEYGRTAKGLPIVNLLKLEDGETIQTIIN 600 

Query: 601 ARKEDVAHKYFFFTTQQGIVKRTSVSEFSNIRQNGLRAINLKElTOELIjm.LIDENEDVI 660 

ARKE+ A K FFFTT+QGIVKRT VSEF+NIRQNGLRA+ LKE D4LINVLL 4D+I 
Sbjct: 601 ARKEETAGKSFFFTTKQGIVKRTEVSEFNNIRQNGLRALKLKEGDQLINVLLTSGQDDII 660 

Query: 661 IGTRTGYSTOFKVNAVRNMGRTATGVRGVNLREGDKVVGASRIVNGQEVI.IITEKGYGKR 720 

IGT +GYSVRF ++RNMGR+ATGVRGV LKE D+WGASRI + QEVL+ITE G+GKR 
Sbjct: 661 IGTHSGYSTOFNEASIRNMGRSATGTOGVKLREDDRWGASRIQDNQEVLVITENGFGKR 720 

Query: 721 TEASEYPTKGRGGKGIKTANITAKNGPLARLVTINGNEDI^^VITDTGVIIRTNVANISQT 780 

T A++YPTKGRGGKGIKTANIT KNG LA LVT++G EDIMVIT+ GVI IRTNVANISQT 
Sbjct: 721 TSATDYPTKGRGGKGIKTANITPKNGQLAGLVTVDGTEDIMVITNKGVI IRTNVANISQT 780 

Query: 781 GRSTMGvKVMRLDQEAKIVTVALVEQEIEDKSNIEDTKE 819 

GR+T+GVK+M+LD +AKIVT LV+ E + I +E 
Sbjct: 781 GRATLGVKIMKLDADAKIVTFTLVQPEDSSIAE1NTDRE 819 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1515 

A DNA sequence (GBSxl605) was identified in S.agalactiae <SEQ ID 4659> which encodes the amino 
acid sequence <SEQ ID 4660>. Analysis of this protein sequence reveals the following: 

Possible site: 25 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 



>GP:CAA04010 GB-.AJ000336 L-lactate dehydrogenase [Streptococcus pneumoniae] 
Identities = 290/329 (88%) , Positives = 313/329 (94%) , Gaps = 1/329 (0%) 

Query: 1 MTATKQHKKVI LVGDGAVGSSYAFALVNQG IAQELGIIEI PALFDKAVGDAEDLSHALAF 60 

MT+TKQHKKVILVGDGAVGSSYAFALVNCGIAQELGIIEIP L 4-KAVGDA DLSHALAF 
Sbjct: 1 MTSTKQHKKVILVGDGAVGSSYAFALVNQGIAQELGIIEIPQLHEKAVGDALDLSHALAF 60 

Query: 61 TSPKKIYAATYADCMADLWITAGAPQKPGETRLDLVGKNLAINKSIVTQVVESGFNGI 120 

TSPKKIYAA Y+DCADADLWITAGAPQKPGETRLDLVGKNLAINKSIVTQWESGF GI 
Sbjct: 61 TSPI<KIYAAQYSDCADADLWITAGAPQKPGETRLDLVGKNIjAINKSIVTQVVESGFKGI 120 

Query: 121 FLVAANPvDvLTYSTWKFSGFPKERVlGSGTSLDSARFRQAIADKIGVDARSVHAYIMGE 180 

FLVAANPVDVLTYSTWKFSGFPKERVZGSGTSLDSARFRQALA+K+ VDARSVHAYIMGE 
Sbjct: 121 FLVAANPVDVLTYSTWKFSGFPKERVIGSGTSLDSARFRQALAEKLDVDARSVHAYIMGE 180 

Query: 181 HGDSEFAWSHANVAGVQLEQWLQENRDIDEQGLVDLFISVRDAAYSIINKKaA.TYYGIA 240 

HGDSEFAVWSH&N+AGV LE++L++ +++ E L++LF VRDAAY+ 1 INKKGATYYGIA 
Sbjct: 181 HGDSEFAWSHANIAGVNLEEFLKDTQNVQEAELIELFEGVRDAAYTIINKKGATYYGIA 240 

Query: 241 VAIjARITKAILDDENAVLPLSVYQEGQYGOVKDVFIGQPAIVGAHGITOPVNIPIjNDAEIi 300 

VALARITKAILDDENAVLPLSV+QEGQYG V++VFIGQPA+VGAHGIVRPVNIPLNDAE 
Sbjct: 241 VAL^ITKAILDDENAVLPLSVFQEGQYG-VENVFIGQPAWGAHGIVRPVNIPLNDAET 299 

Query: 301 QKMQASAEQLKDIIDEAWKNPEFQEASKN 329 

QKMQASA++L+ I IDEAWKNPEFQEASKN 
Sbjct: 300 QKMQASAKELQAI IDEAWKNPEFQEASKN 328 
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A related DNA sequence was identified in S.pyogenes <SEQ ID 4661> which encodes the amino acid 
sequence <SEQ ID 4662>. Analysis of this protein sequence reveals the following: 
Possible site: 25 

lerminal signal sequence 

106 - 122 ( 106 - 122) 

Final Results 

bacterial membrane Certainty=0 . 1468 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:AAB81558 GB:U60997 L(+) -lactate dehydrogenase [Streptococcus 

Identities = 278/329 (84%) , Positives = 297/329 (89%) , Gaps = 2/329 (0%) 

Query: 1 MTATKQHKKVILVGDGAVGSSYAFALVTQ^3IAQELGIIDIFK--EKTQGDAEDLSHMAF 58 

MTATKQHKKVILVGDGAVGSSYAFALV Q IAQELGII+I + K GDAEDLSHAIAF 
Sbjct: 1 MTATKQHKKVILVGDGAVGSSYAFALVNQGIAQELGIIEIPQLFNKAVGDAEDLSHALAF 60 

Query: 59 TSPKItlYAMYSDCHDADLVVLTAGAPQKPGETRLDLVEKNLRINKEVvTQrvASGFKGI 118 

TSPKKIYAA Y DC DADLW+TAGAPQKPGETRLDLV KNL INK +VT++V SGFKGI 
Sbjct: 61 TSPKKIYAAKYEDCADADLWITAGAPQKFGETRLDLVGKNLAINKSIVTEVVKSGFKGI 120 

Query: 119 FLVAANPVDVIiTYSTWKFSGFPKERVIGSGTSLDSARFRQALAAKIGVDARSVHAYIMGE 178 

FLVAANPVDVLTYSTWKFSGFPKERVIGSGTSLDSARFRQALA K+ VDARSVHAYIMGE 
Sbjct: 121 FLVAANPVDVLTYSTWKFSGFPKERVIGSGTSLDSARFRQALAEKLDVDARSVHAYIMGE 180 

Query: 179 HGDSEFAWSHAWAGVGLYDWLQANRDIDEQ^LVDLFISWDAAYSIINKKGATFYGIA 238 

HGDSEFAVWSHANVAGV L +L+ ++++E LV+LF VRDAAYSI INKKGATFYGIA 
Sbjct: 181 HGDSEFAWSHANVAGWLESYLKDVQNVEEAELVELFEGVRDAAYSI INKKGATFYGIA 240 

Query: 239 VALARITKAILDDENAVLPLSVFQEGQYEGVEDCYIGQPAIVGAYGIWPWIPLNDAEL 298 

VALARITKAIL+DENAVLPLSVFQEGQY V DCYIGQPAIVGA+GIVRPVNIPIjNDAE 
Sbjct: 241 VALARITKAILNDENAVLPLSVFQEGQYANvTDCYIGQPAIVGAHGIVRPVNIPLNDAEQ 300 

Query: 299 QKMQASANQLKAI IDEAFAKEEFASAAKN 327 

QKM+ASA +LKAI IDEAF+KEEFASA KN 
Sbjct: 301 QKMEASAKELKAIIDFAFSKEEFASACKN 329 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 286/329 (86%), Positives = 299/329 (89%), Gaps = 2/329 (0%) 

MTATKQHKKVILVGDGAVGSSYAFALVNQGIAQELGIIEIPALFDKAVGDAEDLSHALAF 60 
MTATKQHKKVILVGDGAVGSSYAFALV Q IAQELGII+I +K GDAEDLSHAIAF 
MTATKQHKKVILVGDGAVGSSYAFALVTQNIAQELGI IDI - - FKEKTQGDAEDLSHALAF 5 8 



FLVAANPVDVLTYSTWKFSGFPKERVIGSGTSLDSARFRQALADKIGVDARSVHAYIMGE 180 
FLVABNPVDVLTYSTWKFSGFPKERVIGSGTSLDSARFRQALA KIGVDARSVHAYIMGE 
FLVAANPVDVLTYSTWKFSGFPKERVIGSGTSLDSARFRQALiftaKIGVDARSVHAYIMGE 178 



HGDSEFAVWSHANVAGV L WLQ NRDIDEQGLVDLFISVRDAAYSI INKKGAT+YGIA 



VALARITKAILDDENAVLPLSV+QEGQY V+D + IGQPAIVGA+GI VRPVNI PLNDAEL 



Query: 


1 


Sbjct: 


1 


Query: 


61 


Sbjct: 


59 


Query: 




Query: 


181 


Sbjct: 


179 


Query: 


241 


Sbjct: 


239 



WO 02/34771 



PCT/GB01/04789 



-1677- 



Query: 301 

QKMQASA QLK IIDEA+ EF A+KN 
Sbjct: 299 QKMQASANQLKAIIDEAFAKEEFASAAKN 327 

SEQ ID 4660 (GBS312) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 49 (lane 7; MW 40kDa). 
GBS312-His was purified as shown in Figure 205, lane 9. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1516 

A DNA sequence (GBSxl606) was identified in S.agalactiae <SEQ ID 4663> which encodes the amino 
acid sequence <SEQ ID 4664>. This protein is predicted to be NADH oxidase (nox). Analysis of this 
protein sequence reveals the following: 

N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 1888 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 .0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC26485 GB:AF014458 NADH oxidase [Streptococcus pneumoniae] 
(ver 2) 

Identities = 363/458 (79%), Positives = 408/458 (88%), Gaps = 3/458 (0%) 

MSKIVWGTNHAGTAAIKTMLSNYGEANEIVTFDQNSNISFLGCGMALWIGEQIDGPEGL 60 
MSKIWVG NHAGTA I TML N+G NEIV FDQNSNISFLGCGMALWIGEQJDG EGL 
MSKIVWGANHAGTACINTMLDNFGNENEIWFDQNSNISFLGCGMALWIGEQIDGAEGL 6 0 

FYSDKEQLESMGAKVYMNSPVLNIDYDKKEVTALVDGKEHATESYEKLILATGSQPIIPPI 120 
FYSDKE+LE+ GAKVYMNSPVL+IDYD K VTA V+GKEH ESYEKLI ATGS PI+PPI 



+GVEI +G+REFKATLEN+QFVKLYQN+EEVI KL+ ++R+AWG GYIGVELAEA 



F+R+GKEV LVD+ DT + GYYD+DFT MM+KNLEDH IRLA GQ V+A+EGDGKVERL+ 



TDKE+FDVOMVILAVGFRPNT L GK+f FRNGA++VDKKQETS+ VYA+GDCAT++D 



N+R D +YIALASNAVRTGIV A+NACG ELEG GVQGSNGISIYGL+MVSTGLTLEKAK 



\ ETGFNDLQKPEF+KH+NHEVAI KI V+DKDSR ILG QMVSH+ +SMGIHMFS 



Query: 


1 


Sbjct: 






61 


Sbjct: 


61 




121 


Sbj ct : 


121 




179 


Sbj ct : 


181 




239 


Sbjct: 


241 




299 


Sbjct: 


301 




359 


Sbjct: 


361 




418 


Sbjct: 


421 



LAIQE VTI+KLALTD-l-FFLPHFKIKPYNXITMAMi A+ 
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A related DNA sequence was identified in S.pyogems <SEQ ID 4665> which encodes the amino acid 
sequence <SEQ ID 4666>. Analysis of this protein sequence reveals the following: 

Possible site: 14 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 2068 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 362/456 (79%) , Positives = 403/456 (87%) 

Query: 1 MSKIWVGTNHAGTAAHCIMLSNYGEANEIVTFDQNSNISFLGCGMALVIIGEQIDC3PEGL 60 

MSKIVWG NHAGTA IKTML+NYG+ANEIV FDQNSNI S FLGCGMALWIGEQI GPEGL 
Sbjct: 1 MSKIVWGANHAGTACIKTMLTNYGDANEIWFDQNSNISFLGCGMALWIGEQIAGPEGL 60 

Query: 61 FYSDKEQLESMGJUCTOWSPvLNIDYDKKE^^ 12 0 

FYSDKE+IiES+GAKVYM SPV +IDYD K VTALVDGK HVE+Y+KLI ATGSQPI+PPI 
Sbjct: 61 FYSDKEELESLGAKVYMESPVQSIDYDAKTOTALVDGKNHVETYDKLIFATGSQPILPPI 120 

Query: 121 KGVEIQEGSREFKATLENLQFVTCLYQNSEEVIEKLAKPGINRVAWGAGYIGVELAEAFQ 180 

KG EI+EGS EF+ATLENLQFVKLYQNS +VI KL I RVAWGAGYIGVELAEAFQ 
Sbjct: 121 KGJffilKEGSLEFFATLiENLQFVKLYQNSADVIAKLENKDIKRvAWGAGYIGVELAEAFQ 180 

Query: 181 RIGKEVTLVDVADTCMGGYYDRDFTDMMSKMDEDHGIRLAFGQAVQAVEGDGKVERLVTD 240 

R GEEV L+DV DTC+ GYYDRD TD+M+KN+E+HGI+LAFG+ V+ V G+GKVE+++TD 
Sbjct: 181 RKGKEVVLIDVVDTC»GYYDRDLTDLMAKNMEEHGIQIAFGETVKEVAGNGKVEKIITD 240 

Query: 241 KETFDVD^ILAVGFRPOTELGAGKLDTFRNGAWVVDKKQETSVKDVYAIGDCATIWDNS 300 

K +DVDMVILAVGFRPNT LG GK+D FRNGA++V+K+QETS+ VYAIGDCATI+DN+ 
Sbjct: 241 KNEYDVDIWIIAVGFRPNTTLGNGKIDLFRNGAFLVNKRQETSIPGVYAIGDCATIYDNA 300 

Query: 301 RDDINYIAIASNAWTGIVAAHNACGT3LEGAGVCGSNGISIYGLNMVSTGLTLEKAKQA 360 

D NYIAIASNAVRTGIVAAHNACGT+LEG GVQGSNGI SIYGL+MVSTGLTLEKAK+ 
Sbjct: 301 TRDTNYIALASNAVRTGIVAAHNACGTDLEGIGVQGSNGISIYGLHMVSTGLTLEKAKRL 360 

Query: 361 GYTOVETGFNDLQKPEFIKHNNHEVAIKIVYDKDSRVILGCQMVSHEDVSMGIHMFSLAI 420 

G++A T + D QKPEF1+H N V IKIVYDKDSR ILG QM + EDVSMGIHMFSLAI 
Sbjct: 361 GFDAAVTEYTDNQKPEFIEHGNFPVTIKIVYDKDSRRILGAQMAAREDVSMGIHMFSLAI 420 

Query: 421 QEKVTIEIOiALTDIFFLPHFNKPYNYITMAALGAKD 456 

QE VTIEKLALTDIFFLPHFNKPYNYITMAALGAKD 
Sbjct: 421 QEGVTIEKLALTDIFFLPHFNKPYNYITMAALGAKD 456 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1517 

A DNA sequence (GBSxl607) was identified in S.agalactiae <SEQ ID 4667> which encodes the amino 
acid sequence <SEQ ID 4668>. Analysis of this protein sequence reveals the following: 

N- terminal signal sequence 

• Final Results 

bacterial cytoplasm --- Certainty=0. 2319 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 



60 The protein has no significant homology with any sequences in the GENPEPT database. 
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No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could he useful antigens for 
vaccines or d: 



Example 1518 

A DNA sequence (GBSxl608) was identified in S.agalactiae <SEQ ID 4669> which encodes the amino 
acid sequence <SEQ ID 4670>. Analysis of this protein sequence reveals the following: 

Possible site: 22 



INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 



have a cleavable N- 
Likelihood 
Likelihood 
Likelihood 
Likelihood 
Likelihood 
Likelihood 
Likelihood 
Likelihood 



cm signal seq. 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 



107 - 123 



00 



10S - 124; 



258 - 275! 
233 - 251', 



209 - 22S: 



Final Results 

bacterial membrane - 
bacterial outside - 
bacterial cytoplasm - 



- Certainty=0 .4100 (Affirmative) • 

- Certainty=0. 0000 (Not Clear) < : 

- Certainty=0 . 0000 (Not Clear) < I 



A related GBS nucleic acid sequence <SEQ ID 9805> which encodes amino acid sequence <SEQ ID 9806> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB15146 GB:Z99120 similar to hypothetical proteins [Bacillus subtilis] 
Identities = 172/318 (54%), Positives = 234/318 (73%) 

LSLTTIFALLFSSMLIYATPLIFTSIGGTFSERGGIVNVGLEGIMVIGAFSGWFNLEFA 64 
+ + I +++ + L+YA PLI T++G3 FSER G+VN+GLEG+M+IGAF+ V+FNL F 
MDIVQILS I IVPATLVYAAPLILTALGGVFSERSGWNIGLEGLMI IGAFTSVLFNLFFG 6 0 



3 +FS+IHA A ++FRAD 



K QTD I E F K PL DIP +G IFF 



+RSVGEHP AADT+GINVY MRY GV+ISG GG+GG VYA +I+++F +TI G GFI + 



LAA++FGKW+PIGA+ A+LFFG +QSL++IGS LPL +IP VY4 +APY+LTI+ L F 



FGQAVAPKADGINYIKTK 322 

G+A APKA+G+ YIK K 
IGRADAPKANGVPYIKGK 318 



Query: 
Sbjct: 


5 
1 




65 


Sbjct: 


61 




125 


Sbjct: 


121 


Query: 


185 


Sbjct: 


181 


Query: 


245 


Sbjct: 


241 


Query: 


305 



A related DNA sequence was identified in S.pyogenes <SEQ ID 467 1> which encodes the amino acid 
sequence <SEQ ID 4672>. Analysis of this protein sequence reveals the following: 

Possible site: 22 
»> Seems to have a cleavable N-term signal seq. 

INTEGRAL Likelihood = -8.92 Transmembrane 73 - 89 ( 69 - 97) 
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Likelihood = -5.04 Transmembrane 160 - 176 ( 158 - 

Likelihood = -4.62 Transmembrane 289 - 305 ( 284 - 

Likelihood = -3.98 Transmembrane 234 - 250 ( 232 - 

INTEGRAL Likelihood = -2.13 Transmembrane 107 - 123 { 106 - 

INTEGRAL Likelihood = -2.02 Transmembrane 43 - 59 { 43 - 

INTEGRAL Likelihood = -0.53 Transmembrane 258 - 274 ( 258 - 



Final Results 

bacterial membrane --- Certainty=0. 4567 (Affirmative) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:CAB15146 GB:Z99120 similar to hypothetical proteins [Bacillus subtilis] 
Identities = 176/318 (55%) , Positives = 239/318 (74%) 

Query: 5 MSLVTIFALLMSSMLIYATPLIFTSIGGTFSERSGWWGLEGIMVMGAFSGIVFNLEFA 64 

M +V I ++++ + L+YA PLI T++GG FSERSGWN+GLEG+M++GAF+ ++FNL F 
Sbjct: 1 MDIVQILSIIVPATLVYAAPLILTALGGVFSERSGWNIGLEGLMIIGAFTSVLFNLFFG 60 

Query: 65 ETFGKATPWIAVLVGGIVGLIFSLIHAVATINFRADHIVSGTVLNLLAPSFAVFLVKAMY 124 

+ G A PW+++L G +FSLIHA A I+FRAD VSG +N+LA +F+VK +Y 

Sbjct: 61 QELGAAAPWLSLLAAMAAGALFSLIHAAAAISFRADQTVSGVAINMLALGATLFIVKLIY 120 

Query: 125 GKGQTDNIQQSFGKFDFPGLSQIPVIGDIFFKNTSLIGYFAIAFSFFAWFLLYKTRFGLR 184 

GK QTD I + F K PGL IPV+G IFF + AIA +F +WF+L+KT FGLR 

Sbjct: 121 GKAQTDKIPEPFYKTKIPGLGDIPVLGKIFFSDVYYTSILAIALAFISWFILFKTPFGLR 180 

Query: 185 LRSVGEHPQAADTLGINVYLMKYYGvMISGFLGGIGGAVYAQSISVNFAVTTILGPGFIA 244 

+RSVGEHP AADT+GINVY M+Y GVMISG GG+GG VYA +I+++F +TI G GFIA 
Sbjct: 181 IRSVGEHPMAADTMGINVYKMRYIGVMISGLFGGLGGGVYASTIALDFTHSTISGQGFIA 240 

Query: 245 LAAMIFGKWNPVGAMLSSLFFGLSQSIAVIGAQLPLLEKIPTVYLQIAPY^lVTIIIIlAAF 304 

LAA++FGKH+P+GA+ ++LFFG +QSL++IG+ LPL + IP VY+ +APY++TI+ L F 
Sbjct: 241 LAALVFGKWHPIGALGAALFFGFAQSLSIIGSLLPLFKDIPNVYMLMAPYILTILALTGF 300 

Query: 305 FGQAVAPKADGINYIKSK 322 

G+A APKA+G+ YIK K 
Sbjct: 301 IGRADAPKANGVPYIKGK 318 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 272/322 (84%) , Positives = 301/322 (93%) 

Query: 1 MVSKLSLTTIFALLFSSMLIYATPLIFTSIGGTFSERGGIVNVGLEGIMVIGAFSGWFN 60 

+V+K+SL TIFALL SSMLIYATPLIFTSIGGTFSEK G+VNVGLEGIMV+GAFSG+VFN 
Sbjct: 1 WNKMSLVTIFALLMSSMLIYATPLIFTSIGGTFSERSGWNVGLEGIMVMGAFSGIVFN 60 

Query: 61 LEFASVFGDATPWI SVL VGGLVGLI FSVT HAVATVNFRADH 1 1 SGTVLNLMAPSLAVFLI 120 

LEFA FG ATPWI+VLVGG+VGLIFS + IHAVAT+NFRADHI+SGTVLNL+APS AVFL+ 
Sbjct: 61 LEFAETFGKATPWIAVLVGGI VGLI FSL I HAVATINFRADH I VSGTVLNLLAPS FAVFLV 120 

Query: 121 KVLYNKGQTDNIQESFGKFNFPILSDIPFVGDIFFKGTSLVGYIAILFSFLAWFILYKTR 180 

K +Y KGQTDNIQ+SFGKF+FP LS IP +GDIFFK TSL+GY AI FSF AWF+LYKTR 
Sbjct: 121 KAMYGKGQTDNIQQSFGKFDFPGLSQIPVIGDIFFKNTSLIGYFAIAFSFFAWFLLYKTR 180 

Query: 181 FGLRLRSVGEHPQAADTLGINVY1LMRYSGVLISGFLGGIGGAVYAQSISVNFAATTILGP 240 

FGLRLRSVGEHPQAADTLGINVYLM+Y GV+ISGFLGGIGGAVYAQSISVNFA TTILGP 
Sbjct: 181 FGLRLRSVGEHPQAADTLGINVYLMKYYGVMISGFLGGIGGAVYAQSISVNFAVTTILGP 240 

Query: 241 GFISLAAMIFGKWNPIGAMIJ^LFFGLSQSLAVIGSHLPLLSNIPTVYLQIAPYVLTIIV 300 

GFI+LAAMIFGKWNP+GAML+SLFFGLSQSLAVIG+ LPLL IPTVYLQIAPY++TII+ 
Sbjct: 241 GFIALAAMI FGKWNPVGAMLSSLFFGLSQSLAVIGAQLPLLEKI PTVYLQIAPYMVTI 1 1 300 

Query: 301 LAAFFGQAVAPECADGINYIKTK 322 

LAAFFGQAVAPKADGINYIK+K 
Sbjct: 301 LAAFFGQAVAPKADGINYIKSK 322 
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A related GBS gene <SEQ ID 8829> and protein <SEQ ID 8830> were also identified, 
protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 3 



i>> Seems to have a cleavable N-ter 


m signal seq. 










ALOM program count: 8 value: 


-7. 


75 threshold: 


0.0 








INTEGRAL Likelihood = 


7 


75 


Transmembrane 


160 


176 


157 


179 


INTEGRAL Likelihood = 


7 


38 


Transmembrane 


73 


89 


70 


97 


INTEGRAL Likelihood = 


5 


47 


Transmembrane 


289 


305 


284 


312 


INTEGRAL Likelihood = 




09 


Transmembrane 


107 


123 


106 


124 


INTEGRAL Likelihood = 






Transmembrane 


43 






59 


INTEGRAL Likelihood = 


1 


91 


Transmembrane 


258 


274 


258 


275 


INTEGRAL Likelihood = 




33 


Transmembrane 


234. 


250 


233 


251 


INTEGRAL ' Likelihood = 


0 


00 


Transmembrane 


209 


225 


209 


225 


PERIPHERAL Likelihood = 


3 


34 


139 










modified ALOM score: 2.05 

















Step: 3 

• Final Results 

bacterial membrane 
bacterial outside 
bacterial cytoplasm 



-- Certainty=0. 4100 (Affirmative) < £ 
— Certainty=0. 0000 (Not Clear) < sue 
-- Certainty=0 . 0000 (Not Clear) < sue 



The protein has homology with the following sequences in the databases: 

ORF00914(313 - 1266 of 1566) 

EGAD] 108729 1 BS315K1 - 318 of 319) hypothetical protein {Bacillus subtilis} 
GP| 1934814 |emb|CAB07939.l| | Z93937 unknown {Bacillus subtilis} 

GP|2635653|emb|CAB15146.l| |Z99120 similar to hypothetical proteins {Bacillus subtilis} 
PIR|F70009|F70009 conserved hypothetical protein yufQ - Bacillus subtilis 
%Match =34.9 

%Identity =54.1 %Similarity =76.4 

Matches = 172 Mismatches = 75 Conservative Sub.s = 71 



246 



276 



306 



336 



366 



396 



TLQVFHLS*LKL*QLQSSSS*VSITLLSMLLNLKNK*KWSKLSLTTIFALLFSSMLIYATPLIFTSIGGTFSERGGIVN 
= = 1- = : = 1 = 11 111 = 1-11 I I I I 1 = 11 
MDIVQILSIIVPATLVYAAPLILTALGGVFSERSGWN 



VGLEGIMVIGAFSGWFNLEFASVFGDATPWISVLVGGLVGLIFSVIHAVATW 

=1111=1=1111= 1=111 I =1 I 11=1=1 I =11=111 I ==1111= =11 =l==l =l==l= 

IGLEGLMIIGAFTSVLFNLFFGQELGAAAPWLSLIiAAMAAGALFSLIHAAAAISFRADQ 



LYNKGQTDNIQESFGKFNFPILSDIPFVGDIFFKGTSLVGYIAILFSFLAWFILYKTRFGLRLRSVGEHPQAADTLGINV 
=111111111 I I III =1 III =11 ==l==llll=ll 1111=1111111 1111=1111 

IYGECAQTDKIPEPFYKTKIPGLGDIPvLGKIFFSDVYYTSII^AIuAFISWFILFICTPFGLRIRSVGEHPMAADTMGINV 



140 



170 



180 



190 



936 966 996 1026 1056 1086 1116 1146 

YLMRYSGVLISGFLGGIGGAVYAQSISWFAATTILGPGFISIA^IIFGKWNPIGAMLASLFFGLSQSLAVIGSHLPLLS 
I HI INM-INI Ml =l = = :| =11 I ll|:|ll = = IIIMIII= hllll-llh=lll 111 = 
YKMRYIGVMISGLFGGLGGGVYASTIAI13FTHSTISGQGFIAIAALVFGKWHPIGALGAALFFGFAQSLSIIGSLLPLFK 



230 



240 



250 



260 



NI PTVYLQIAPYVLTI I VLAAFFGQAVAPKADGINYI KTK* I KEN* YKLVSFYCL* ICEKILCENFT* 1 1 IQ*Q*NIKK* 
=11 11= =111=111= I I 1=1 1111=1= III I 
65 DIPNVYMLMAPYILTILALTGFIGRADAPKANGVPYIKGKR 
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Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or d 



5 Example 1519 

A DNA sequence (GBSxl609) was identified in S.agalactiae <SEQ ID 4673> which encodes the amino 
acid sequence <SEQ ID 4674>. This protein is predicted to be ribose/galactose ABC transporter, permease 
protein (rbsC-1). Analysis of this protein sequence reveals the following: 



Possible site: 55 

»> Seems to have an uncleavable IS 

INTEGRAL Likelihood =-14.59 

INTEGRAL Likelihood =-13.69 

INTEGRAL Likelihood = -7.27 

INTEGRAL Likelihood = -7.17 

INTEGRAL Likelihood = -4.25 

INTEGRAL Likelihood = -2.97 

INTEGRAL Likelihood = -2.87 



•term signal seq 

Transmembrane 205 - 221 

Transmembrane 21 - 37 

Transmembrane 302 - 318 

Transmembrane 115 - 131 

Transmembrane 251 - 2S7 

Transmembrane 63 - 79 

Transmembrane 333 - 349 



200 - 228: 



290 - 321! 
111 - 138 



250 - 268 



328 - 349 



- Final Results 

bacterial membrane - 
bacterial outside - 
bacterial cytoplasm - 



- Certainty=0 . 6838 (Affirmative) ■ 
• Certainty=0 . 0000 (Not Clear) < i 

- Certainty=0. 0000 (Not Clear) < i 



A related GBS nucleic acid sequence <SEQ ID 883 1> which encodes amino acid sequence <SEQ ID 8832> 
was also identified. Analysis of this protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 6 
SRCFLG: 0 

McG: Length of UR: 24 

Peak Value of UR: 3.06 
Net Charge of CR: 3 
McG: Discrim Score: 12.53 
GvH: Signal Score (-7.5): -5.31 

Possible site: 46 
>>> Seems to have an uncleavable N-term signal seg 
Amino Acid Composition: calculated from 1 
ALOM program count: 7 value: -14.59 threshold: 
Likelihood =-14.59 Transmembrar 

Likelihood =-13.69 Transmembrane 12 - 28 ( 4 - 36) 
Likelihood = -7,27 Transmembrane 293 - 309 ( 281 - 312) 
INTEGRAL Likelihood = -7.17 Transmembrane 106 - 122 ( 102 - 129) 
INTEGRAL Likelihood = -4.25 Transmembrane 242 - 258 ( 241 - 259) 
INTEGRAL Likelihood = -2.97 Transmembrane 54 - 70 ( 54 - 71) 
INTEGRAL Likelihood = -2.87 Transmembrane 324 - 340 ( 319 - 340) 
PERIPHERAL Likelihood = 0.16 133 
modified ALOM score: 3.42 
icml HYPID: 7 CFP: 0.684 

*** Reasoning Step: 3 



196 - 212 ( 191 - 219) 



- Final Results 

bacterial membrane Certainty=0. 6838 (Affirmative) . 

bacterial outside Certainty=0. 0000 (Not Clear) < i 

bacterial cytoplasm — Certainty=0. 0000 (Not Clear) < i 



55 The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB15145 GB:Z99120 similar to hypothetical proteins [Bacillus subtilis] 
Identities = 154/349 (44%) , Positives = 220/349 (62%) , Gaps = 6/349 (1%) 

Query: 10 MSKKAQKIAVPLISWLGIILGAIIMLIFGYDPLWC-YEGLFQTAFGSIKNIGEIFRAMGP 69 
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Sbjct: 1 

Query: 70 LILIALGFSVASRAGFFNIGLPGQAIiSGWIAflGWFALSHPDMPRPAMILCTIIIGIVAGG 129 

IL L + A R G FNIG+ GQ L GW AA W + DP + +1 AGG 
Sbjct: 61 YILSGIAVAFAFRTGLFNIGVEGQLLVGWTAAVWGTAF-IX3PAYIHLPLALITAAAAGG 119 

Query: 130 ITGAIPGILRAYLGTSEVIOTIMMtWIVIjYSGNAIVQRVFPKSIMRTSDSSVYVSANASY 189 

+ G IPGIL+A EVIVTIMMNYI L+ N 1+ V D + + +AS 

Sbjct: 120 LWGF I PGILKARFYVHEVIVTIMMNYI ALHMTTNYI I SNVLTDH QDKTGKIHESASL 175 

Query: 190 QTDWLSSLTNNSRINIGIFIAIIAVVLVWFLLNKTTLGFEIRSVGLNPNASEYAGMSAKR 249 

++ +L +T+ SR+++GI +A++A V++WF++NK+T GFE+R+VG M +AS+YAGMS ++ 
Sbjct: 176 RSPFLEQITDYSRLHLGIIVALLAAVIMWFIINKSTKGFELRAVGFNQHASQYAGMSWK 235 

Query: 250 TIILSMIISGAFAGLGGWEGLGTFEWFVQPSSLAIGFDGMAVSLLAANSPIGILFAAF 309 

1+ SM+ISGAFAGL G +EGLGTFE V+ + +GFDG+AV+LL N+ +G++ AA 
Sbjct: 236 NIMTSMLISGAFAGLAGAMEGLGTFEYAAVKGAFTGVGFDGIAVALLGGNTAVGWLAAC 295 

Query: 310 LFGVLSVGAPGMMI-AGIPPELIKWTASIIFFVGVHYIIEYVIKPKKQ 357 

L G L +GA M I +G+P E++ +V A II FV Y I +V+ K+ 
Sbjct: 296 LLGGLKIGALNMPIESGVPSEWDIVIAIIILFVASSYAIRFVMGICLKK 344 

A related DNA sequence was identified in S. pyogenes <SEQ ID 2149> which encodes the amino acid 
sequence <SEQ ID 2150>. Analysis of this protein sequence reveals the following: 



i uncleavable N-term signal seq 



Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 



• Final Results 

bacterial membrane - 
bacterial outside - 
bacterial cytoplasm - 



Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 



- Certainty=0 . 

- Certainty=0 . 

- Certainty=0. 



205 - 221 



115 - 131 



326 - 342 



200 - 228; 



i095 (Affirmative) < succ: 
1000 (Not Clear) < suco 
i000(Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below. 

Identities =293/358 (81%), Positives = 333/358 (92%), Gaps = 



Query: 
Sbjct: 



RRREMSKKAQKIAVPLIS WLGI I LGAI IKLI FGYDPLWGYEGLFQTAFGSI KNIGEIFR 6 5 
RR+ MSK AQKIAVPLISV+LG +LGAIIM+IFGYDP+WGYEGLFQ AFGS+KNIGEIFR 
RRKVMSKNAQKIAVPLISVLLGFLLGAI IKVI FGYDP IWGYEGLFQIAFGS VKNIGEI FR 6 5 



Query: 66 AMGPLILIALGFSVASRAGFFNIGLPGQALSGWIAAGWFALSHPDMPRPAMILCTIIIGI 125 

+MGPLILIALGF+VASRAGFFK+GL GQAL4GWI+AGWFAL +PDMPRP +IL T +IG+ 
Sbjct:' 66 SMGPLILIALGFWASRAGFFNi/GLSGQALAGWISAGWFALLNPDMPRPLLILMTALIGM 125 

Query: 126 VAGGITGAIPGILPAYLGTSEVIVTIMMNYIVLYSGNAIVQRVFPKSIMRTSDSSVYVSA 185 

+AGGI GAIPGILRAYLGTSEVIVTIMMNYI+LY GWAIVQR +P+S+ ++ DS++ VS 
Sbjct: 126 IAGGIAGAIPGILRAYLGTSEVIVTIMMNYIILYVGNAIVQRGYPESVKQSIDSTIQVSD 185 



Sbjct: 186 NASYQTHWLSALTNNSRINIGIFFAIIAIALIWFLLNKTTLGFEIRSVGLNPHASEYAGM 245 

Query: 246 SAKRTIILSMIISGAFAGLGG^7VEGLGTFEN\'FVQPSSLAIGFDG^1AVSLLAANSPIGIL 305 

S+KRTIILSMIISGA AGLGGWEGLGTFENVFVQ SSLA+GFDGMAVSLLAANSP+GI 
Sbjct: 246 SSKRTIILSMIISGAIAGLGGVTEGLGTFELWFVG^SSIAVGFDG^VSLLAANSPLGIF 305 

Query: 306 FAAFLFG VLSVGAPGMNIAGI PPELIKWTAS 1 1 FFVGVHYI IE - YVI KPKKQMKGGK 362 
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-1684- 

F++FLFGVL++GAPGMNIAGIPPEL+KVVTASIIFFVG HY+IE Y+I+PKK +KGGK 
Sbjct: 306 FSSFLFGVLNIGAPGMNIAGIPPELVKWTASIIFFVGSHYLIERYIIRPKKLVKGGK 363 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
5 vaccines or diagnostics. 

Example 1520 

A DNA sequence (GBSxl610) was identified in S.agalactiae <SEQ ID 4675> which encodes the amino 
acid sequence <SEQ ID 4676>. This protein is predicted to be sugar ABC transporter, ATP-binding protein 
(mglA). Analysis of this protein sequence reveals the following: 

10 Possible site: 57 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .3851 (Affirmative) < suco 

15 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9803> which encodes amino acid sequence <SEQ ID 9804> 
was also identified. 

20 The protein has homology with the following sequences in the GENPEPT database. 



>GP:CAB15144 GB:Z99120 similar to ABC transporter {ATP-binding 
protein) [Bacillus subtilis] 
Identities = 311/497 (62%) , Positives = 396/497 (79%) , Gaps = 1/497 (0%) 

Query: 14 VIEMKEITKKFGDFVANDHINLTVEKGEIHALLGEKGAGKSTLMNMLAGLLEPTDGQIFI 73 

VIEM I K F VAND+INL V+KGEIHALLGENGAGKSTLMN+L GL +P G+I + 
Sbjct: 4 VIEMmiRKAFPGIVANDNINLQVKKGEIHALLGENGAGKSTLMNVLFGLYQPERGEIRV 63 

Query: 74 NGQPVTIDSPSKSSQLGIGIWHQHFMLVEAFTVAENIVLGNETTQNGVLDIKTAAKEIKE 133 

G+ V I+SP+K++ LGIGMVHQHFMLV+ FTVAENI+LG E + G +D K A +E+++ 
Sbjct: 64 RGEKVHINSPNKANDLGIGMVHQHFMLVBTFTVAENIILGKEPPCKFGRIDRKRAGQEVQD 123 

Query: 134 LSEKYGLSVNPNAKISDISVGAQQRVEILKTLYRGADILIFDEPTAVLTPSEIKELMTIM 193 

+S++YGL ++P AK +DISVG QQR EILKTLYRGADILIFDEPTAVLTP EIKELM IM 
Sbjct: 124 ISDRYGLQIHPEAKAADISVGMQQRAEILKTLYRGADILIFDEPTAVLTPHEIKELMQIM 183 

Query: 194 KSLVKEGKSIILITHKLDEIRAVADKW/IRRGKSIETVPVAGASSQQLAEMMVGRSVSF 253 

K+LVKEGKSIILITHKL EI + D+VTVIR+GK I+T+ V + +LA +MVGR VSF 
Sbjct: 184 KNLVKEGKSIILITHKLKEIMEICDRVTVIRKGKGIKTLDVRDTNQDELASLMVGREVSF 243 



Query: 254 RTEKKEANPTDIILSVKDLVVEENRGGVLAVKNLSLDVRAGEIVGIAGIDGNGQSELIQA 313 

+TEK+ A P +L++ + V++ R G+ V++LSL V+AGEIVGIAG+DGNGQSELI+A 
Sbjct: 244 KTEKRAAQPGAEVIAIDGITVKDTR-GIETVRDLSLSVKAGEIVGIAGVDGNGQSELIEA 302 

Query: 314 ITGLRKVTSGQIVIKGI03VTKFSSRQITELSVGHVPEDRHRDGLVLDMTMAENLALQTYY 373 

+TGLRK SG I + GK + + R+ITE +GH+P+DRH+ GLVLD + EN+ LQ+YY 
Sbjct: 303 VTGLRKTDSGTITLNGKQIQNLTPRKITESGIGHIPQDRHKHGIiVLDFPIGENILLQSYY 362 

Query: 374 KEPLSHKGIIjNFAKIKEYARQLMTEFDVRGAGEHVLARGFSGGNQQJCAIIAREVDRDPDL 433 

K+P S G+L+ ++ + AR L+TE+DVR E+ AR SGGNQQKAII RE+DR+PDL 
Sbjct: 363 KKPYSALGVmKGEMYKKARSLITEYDVRTPDEYTHARALSGGNQQKAIIGREIDRNPDL 422 

Query: 434 LIVSQPTRGLDVGAIEYIHKRLIEERDKGKAVIjWSFELDEILNLSDRIAVIHDGKIQGI 493 

LI +QPTRGLDVGAIE++HK+LIE+RD GKAVL++SFEL+EI+NLSDRIAVI +G+I 
Sbjct: 423 LIAAQPTRGLDVGAIEFVHKKLIEQRDAGKAVLLLSFELEEIMNLSDRIAVIFEGRIIAS 482 



Query. 494 VKPDQTNKQELGILMAG 510 
V P +T +QELG+LMAG 
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Sbjct: 483 VNPQETTEQELGLLMAG 499 
Identities = 75/242 (30%) , Positives = 128/242 (51%) , Gaps 



G++A N++L V+ GEI + G +G G+S L+ + GL + G+I ++G+ V 



+L +G V H+ +++D T+AEN+ h KEP 

■-KEPKK-- 



Query: 


280 


Sbjct: 


16 


Query: 


340 


Sbj ct : 


76 


Query: 


397 


Sbj ct : 


123 




452 


Sbj ct : 


183 




512 


Sbjct: 


240 



-FSGGNQQKAI IAREVDRDPDLLIVSQPTRGL- - -DVGAIEYI 451 
S G QQ+A I + + R D+LI +PT L ++ + I 



HKRLIEERDKGKAVLWSFELDEILNLSDRIAVIHDGKIQGIVKPDQTNKQELGILMAGG 511 



A related DNA sequence was identified in S.pyogenes <SEQ ID 4677> which encodes the amino acid 
sequence <SEQ ID 4678>. Analysis of this protein sequence reveals the following: 

Possible site: SO 

>» Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0. 3558 (Affirmative) 

bacterial membrane Certainty=0 . 0000 (Not Clear) < 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 431/511 (84%) , Positives = 467/511 (91%) , Gaps = 1/51] 



+ 1 IN +PV IDSPSKS+4-LGIGMVHQHFMLVEAFTVAENI+LGNE +NG LD+ A+K 
EIVINDKPVQIDSPSKSAKLGlGMVHQHFMIiVEAFTVAENI ILGNEWKNGCLDLNQASK 126 

EIKELSEKYGLSVNPNAKISDISVGAQQRVEILKTLYRGADILIFDEPTAVLTPSEIKEL 189 
+IK LSEKYGL++NP+AK+SDISVGAQQRVEII J KTLYRGADIIiIFDEPTAVLTP+EIKEL 
DIICVLSEKYGLAINPSAKVSDISVGAQQRVEILKTLYRGADILIFDEPTAVLTPAEIKEL 186 

MTIMKSLVKEGKSIILITHKLDEIFAVADKVTVIRRGKSIETVPVAGASSQQLAE^1MVGR 249 
MT IMK+ LVKEGKS I I L I THKLDE I RAVAD+VTVTRRGKS I ETV VAGA+SQ LAEMMVGR 
MTIMKNLVKEGKSIILITHKLDEIFAVADRVTVZRRGKSIETVDVAGATSQDLM^MVGR 246 

SVSFRTEKKEANPTDIILSVKDLVVEENRGG , i/LAVI<KLSLD' < /RAGEIVGIAGIDGNGQSE 309 
SVSF T KK A P D++LS+K+L V+ENR GV AVK LSLDVRAGEIVGIAGIDGNGQSE 
SVSFTTSKKAAEPKDWLSIKNLEVDENR-GVPAVKGLSLDVRAGEIVGIAGIDGNGQSE 305 

LIQAITGLRKOTSGQIVIKGKDvTKFSSRQITELSVGH\TEDRHRTX3LVLDMTMAENLAL 369 
LIQAITGLRKV SG I+IK +VT SSR+ITELSVGH\'PEDRHRDGL+LD+++AEN AL 
LIQAITGLRKVKSGSIMIKNNEVTHLSSRKITELSVGH\TEDRHRDGLILDLSLAENTAL 365 

QTYYKEPLSHKGimFAKIKEYARQLMTEFDTOGAGEHVLARGFSGGNQQKAIIAREVDR 429 
QTYYK+PLS GILN+ KI +YARQLM EFDVRGA E V ARGFSGGNQQKAI IAREVDR 
3 QTyyKQPLSQNGILNYTKINDYARQIMKEFDVRGANELVPARGFSGGNQQI<AIIAREVDR 425 

Query: 430 DPDLLIVSQPTRGLDVGAIEYItlKRLIEERDKGKAVLWSFELDEILNLSDRIAVIHDGK 489 





10 


Sbjct: 


7 




70 


Sbjct: 


67 




130 


Sbjct: 


127 




190 


Sbjct: 


187 


Query: 


250 


Sbjct: 


247 


Query: 


310 


Sbjct: 


306 




370 


Sbjct: 


366 




430 
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DPDLLIVSQPTRGLDVGAIEYIHKRLI+ERDKGKAVLWSFELDEIIiNLSDRIAVIHDGK 
Sbjct: 426 DPDLLIVSQPTRGLDVGAIEYIHKRLIKERDKGKAVLWSFELDEIIiHLSDRIAVIHDGK 485 

Query: 490 ICGIVKPDQTNKQELGILMAGGKIEKEERDV 520 

IQGIV P+ TNKQELGILMAGG IKE V 
Sbjct: 486 I QGI VS PENTNKQELGI LMAGGS IHKEEGHV 516 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1521 

A DNA sequence (GBSxl612) was identified in S.agalactiae <SEQ ID 4679> which encodes the amino 

acid sequence <SEQ ID 4680>. Analysis of this protein sequence reveals the following: 

Possible site: 22 

>» May be a lipoprotein 

Final Results 

bacterial membrane --- Certainty=0 . 0000 (Not Clear) < succs 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

. >GP:CAB15143 GB:Z99120 similar to ABC transporter (lipoprotein) 
[Bacillus subtilis] 
Identities = 164/335 (48%) , Positives = 224/335 (65%) , Gaps = 10/335 (2%) 

Query: 18 LAACGHRGAS KSGGKS -DSLKVAMVTDTGGVDDKSFNQSGWEGMQAWGKKNGIjKKGA - GF 75 

L ACG+ S G+ + VAMVTD GGVDDKSFNQS WEG+QA+GK+NGLKKG G+ 
Sbjct: 11 LGACGNSEKSSGSGEGKNKFSVAMVTDVGGVDDKSFNQSAWEGIQAFGKENGLKKGKNGY 70 

Query: 76 DYFQSASESDYATNLDTAVSSGYKLIFGIGFSLHDAIDKAADNNKDVNYVIVDDVIKGKD 135 

■ DY QS S++DY TNL+ + LI+G+G+ + D+I + AD K+ N+ I+D V+ KD 

Sbjct: 71 DYLQSKSDADYTTNnNKLARENFDLI YGVGYLMEDS I SEIADQRKNTNFAI IDAWD - KD 129 

Query: 136 NVASWFADNESAYLAGIAaAKTTKTKTVGFVGGMESEVITRFEKGFEAGVKSVDKSIKI 195 

NVAS+ F + E ++L G+AAA ++K+ +GFVGGMESE+ 1 +FE GF AGV++V+ + 
Sbjct: 130 NVASITFKEQEGSFLVGVAAALSSKSGKIGFVGGMESELIKKFEVGFRAGVQAVNPKAW 189 

Query: 196 KVDYAGSFGDAAKGKTIAAAQYASGADIVYQVAGGTGAGVFSEAKSRNESLKFJffiKVWVL 255 

+V YAG F A GK A + Y SG D++Y AG TG GVF+EAK+ + + D VWV+ 
Sbjct: 190 EVKYAGGFDKADVGKATAES^KSGVDVIYHSAGATGTGVFTEAKNLKKEDPKRD-VWVI 248 

Query: 256 GVDRDQAAEGKYTSKDGKASNFVLASSIKEVGKSVELIATKTSKGKFPGGNVTTYGLKDG 315 

GVD+DQ AEG+ +G H IS +K+V VE + K S GKFPGG TYGL 
Sbjct: 249 GVDKDQYAEGQV EGTDDNVTLTSMVTCKVDTVVEDVTKKASDGKFPGGETLTYGLDQD 305 

Query: 316 GVDIATT- -NLSDDAVKAIICEAKAKIISGDIKVPS 348 

GV 1+ + NLSDD +KA+ + K KII G +++P+ 
Sbjct: 306 GVGISPSKQNLSDDVIKAVDKWKKKIIDG-LEIPA 339 

A related DNA sequence was identified in S. pyogenes <SEQ ID 86 1> which encodes the amino acid 
sequence <SEQ ID 862>. Analysis of this protein sequence reveals the following: 



- Final Results 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0. 0000 (Not Clear) < suco 



60 An alignment of the GAS and GBS proteins is shown below. 
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Identities = 275/351 (78%) , Positives = 312/351 (88%) , Gaps = 3/351 (0%) 

Query: 1 NMKKIAGIGIASIAVI 1 SI^CGHRGASKSG--GKSDSLKVAim'DTGGVDDKSFNQSGWE 58 

MNKK G+GLAS+AVLSIAACG+RGASK G GK+D LKVAMVTDTGGVDDKSFNQS WE 
Sbjct: 1 MNKKFIGLGIASVAVLSLARCGl^GASKGGASGKTD-LKVAMVTDTGGVDDKSFNQSAWE 59 



Query: 59 

G+Q+WGK+ GL+KG GFDYFQS SES+YATNLDTAVS GY+LI +GIGF+L DAI KAA + 
Sbjct: 60 GLQSWGKEMGLQKGTGFDYFQSTSESEYATNLDTAVSGGYQLIYGIGFALKDAIAKAAGD 119 

Query: 119 NKDVKWIVDDVIKGKDNVASWFADlffiSAYLAGIAAAKTTKTKTVGFVGG 178 

N+ V +VI+DD+I+GKDNVASV FAD+E+AYLAGIAAAKTTKTKTVGFVGGME VITRF 
Sbjct: 120 NEGVKFVI IDDI IEGKDNVASVTFADHEAAYLAGIAAAKTTKTKTVGFVGGMEGTVITRF 179 

Query: 179 EKGFEAGVKSVDKSIKIKVDYAGSFGDAAKGKTIAAAQYASGADIVYQVAGGTGAGVFSE 238 

EKGFEAGVKSVD + 1 + + KVDYAGS FGDAAKGKT I AAAQYA+GAD++YQ AGGTGAGVF+E 
Sbjct: 180 EKGFEAGVKSVDDT IQVKVDYAGS FGDAAKGKT IAAAQYAAGAD VTYQAAGGTGAGVFNE 239 

Query: 239 AKSRlTOSLKEADKVWVLGvDRDQAAEGKYTSI<DQKASNFvLASSII<EVGKSVELIATKTS 298 
AK+ NE EADKVWV+GVDRDQ EGKYTSKDGK +NFVLASSIKEVGK+V+LI + + 

Query: 299 KGKFPGGNVTTYGLKDGGVDIATTNLSDDAVKAIKEAKAKIISGDIKVPSK 349 

KFPGG T YGLKDGGV+ 1 ATTW+S 4-AVKAIKBAKAKI SGDIKVP K 
Sbjct: 300 DKKFPGGKTTVYGLKDGGVEIATTNVSKEAVKAIKEAKAKIKSGDIKVPEK 350 

A related DNA sequence was identified in S.pyogenes <SEQ ID 906 1> which encodes amino acid sequence 

<SEQ ID 9062>. Analysis of this protein sequence reveals the following: 

Possible site: 17 

>» May be a lipoprotein 

Final Results 

bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm — Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS sequences follows: 



Query: 1 MNKKVMSLGLVSTALFTLGGCTNNSAKQT--TDNSLKIAMITNQTGIDDKSFNQSAWEGL 58 

MNKK+ +GL S A+ +L C + A ++ +SLK+AM+T+ G+DDKSFNQS WEG+ 
Sbjct: 1 MNKKIAGIGIASIAVLSI^CGHRGaSKSGGKSDSLKVAMVTDTG^VDDKSFNQSGWEGM 60 



Query: 119 DNHFAIVDDVIKGQKNVASITFSDHEAAYIJiGVXXXXXXXXXQVGFVGGMEGDWKRFEK 17B 

D ++ IVDDVIKG+ NVAS+ F+D+E+AYLAG+ VGFVGGME +V+ RFEK 

Sbjct: 121 DVNYVIVDDVIKGKDNVASWFADNESAYLAGIAAAKTTKTKTVGFVGGMESEVITRFEK 180 



Sbjct: 181 GFEAGVKSVDKSIKIKVDYAGSFGDARKGKTIAAAQYASGADIvYQVAGGTGAGVFSEAK 240 

Query: 239 SINEKRKEEDKVWIGvDRDQSEDGKYTTKDGKSANFVLTSSIKEVGKALVKVAVKTSED 298 

S NE KE DKVWV+GVDRDQ+ +GKYT+KDGK++NFVL SSIKEVGK++ +A KTS+ 
Sbjct: 241 SRNESLKEADKVWVLGVDRDQAAEGKY^SKDGKASNEVIjASSIKEVGKSVELIATKTSKG 300 

Query: 299 QFPGGQITTFGLKEGGVSLTTDALTQDTXX3CXXXXXXXXXXGTITVP 345 

+FPGG +TT+GLK+GGV + T L+ D G I VP 

Sbjct: 301 KFPGGWITYGLKDGGVDIATTNLSDDAVKAIKEAKAKIISGDIKVP 347 
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SEQ ID 4680 (GBS211) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 49 (lane 6; MW 40kDa). 

The GBS211-His fusion product was purified (Figure 205, lane 8) and used to immunise mice. The 
resulting antiserum was used for Western blot (Figure 259A) and FACS (Figure 259B). These tests confirm 
5 that the protein is immunoaccessible on GBS bacteria. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1522 

A DNA sequence (GBSxl613) was identified in S.agalactiae <SEQ ID 4681> which encodes the amino 
10 acid sequence <SEQ ID 4682>. This protein is predicted to be cytidine deaminase (cdd). Analysis of this 
protein sequence reveals the following: 

Possible site: 42 

»> Seems to have no N-terminal signal sequence 

15 Final Results 

bacterial cytoplasm Certainty=0 .2112 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 .0000 (Not Clear) < suco 

20 A related GBS nucleic acid sequence <SEQ ID 9801> which encodes amino acid sequence <SEQ ID 9802> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB51906 GB:AJ237978 cytidine deaminase [Bacillus psychrophilus] 
Identities = 66/114 (57%) , Positives = 81/114 (70%) 

25 

Query: 26 KASENAWPYSKFPVS^RTAEGKIFTGCNVENISYGLANCAERTAIFKAVSEGYKDFS 85 

KA E AYVPYSKFPVGAAL +G 1+ GCN+EN +Y + NCAERTA FKAVS+G + F 
Sbjct: 12 KAREQAYVPYSKFPVGAALLAEDGTIYHGCNIENSAYSMTNCAERTAFFKAVSDGVRSFK 71 

30 Query: 86 EIAIYGNTEEPISPCGACRQVMVEFFNKNAKVTLIAKNGKTVETTVGELLPYSF 13 9 

+A+ +TE P+SPCGACRQV+ EF N + V L G ETTV +LLP +F 
Sbjct: 72 ALAWADTEGPVSPCGACRQVIAEFCNGSMPVYLTN1.KGDIEETTVAKLLPGAF 125 

A related DNA sequence was identified in S.pyogenes <SEQ ID 4683> which encodes the amino acid 
35 sequence <SEQ ID 4684>. Analysis of this protein sequence reveals the following: 

Possible site: 13 

»> Seems to have no N-terminal signal sequence 

Final Results 

40 bacterial cytoplasm Certainty=0. 0041 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

45 >GP:CAB15143 GB:Z99120 similar to ABC transporter (lipoprotein) 

[Bacillus subtilis] 

Identities = 152/339 (44%) , Positives = 223/339 (64%) , Gaps = 11/339 (3%) 

Query: 8 LGLVSTALFTLGGCTNN SAKQTTDNSLKIAMITNQTGIDDKSFNQSAWEGLQAWGKE 64 

50 " + LV A LG C N+ S N +AM+T+ G+DDKSFNQSAWEG+QA+GKE 

Sbjct: 1 MSLVIAAGTILGACGNSEKSSGSGEGKNKFSVAMVTDVGGVDDKSFNQSAWEGIQAFGKE 60 
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61 
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An alignment of the GAS and GBS proteins is shown below. 

Identities = 88/128 (68%) , Positives = 107/128 (82%) 

Query: 15 MGNIELKKIAVKASKHAYVPYSKFPVGAALRTAEGKIFTGCMVENISYGLANCAERTAIF 74 

MG +L AV+ASE AYVPYS FPVGAAL+T +G I+TGCN+EN+S+GL NC ERTAIF 
Sbjct: 1 MGTTDLVSCAVQASEYAYVPYSHFPVGAALKTKDGTIYTGCMIENVSFGLTMCGERTAIF 60 

Query: 75 KAVSEGYKDFSEIAIYGNTEEPISPCGACRQVI^FFNKI^AKVTLIAKNGKTVETTVGEL 134 

, KA+S+G+K+ EIAIYG T +P+SPCGACRQVM EFF+ ++ VTLIAKNG+TVE TVG+L 
Sbjct: 61 KAISDGHKELVEIAIYGETMQPVSPCGACRQVMAEFFDPSSLVTLIAKNGQTVEMTVGDL 120 

Query: 135 LPYSFVDL 142 

L YSF DL 
Sbjct: 121 LLYSFTDL 128 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1523 

A DNA sequence (GBSxl614) was identified in S.agalactiae <SEQ ID 4685> which encodes the amino 
acid sequence <SEQ ID 4686>. Analysis of this protein sequence reveals the following: 

Possible site: 34 

>>> Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0 .2979 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9799> which encodes amino acid sequence <SEQ ID 980O 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB11882 GB:Z99104 alternate gene name: ybaA-similar to 
hypothetical proteins [Bacillus subtilis] 
Identities = 90/201 (44%) , Positives = 144/201 (70%) , Gaps = 5/201 (2%) 

Query: 1 MANMYYTENPNVEHDIHELNVKLLGESFSFLTDAGVFSKRMIDYGSQVLIJSISLHF-EKNK 59 

M+ YY+E P+V+ + + +L + F+F +D+GVFSK+ +D+GS++L++S E 
Sbjct: 1 MSEHYYSEKPSVKSNKQTWSFRLENKDFTFTSDSGVFSKKEVDFGSRLLIDSFEEPEVEG 60 
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Query: 60 SLLDLGCGYGPLGISIAK-VQGVKATIvrTOINTEALELAKKNaTRNGVV-VEVFQSNIYEN 117 

+LD+GCGYGP+G+SLA + M+D+N RA+EL+ +NA +NG+ V+++QS+++ N 

Sbjct: 61 GILDVGCGYGPIGLSIASDFKDRTIHMIDVNERAVELSNENAEQNGITNVKIYQSDLFSN 120 

Query: 118 I--SKTFDYIISNPPIRAGKQWHSIIEESICYLHTGGSLTIVIQKKQGAPSAKAKMLDT 175 

+ ++TF I++NPPIRAGK4WH+I E+S +L G L IVIQKKQGAPSA K+ + 
Sbjct: 121 VDSAQTFAS ILTNPPIRRGKKVVHAI FEKSAEHLKASGSLWIVIQKKQGAPSAIEKLEEL 180 

Query: 176 FGNCDILKKDKGYYILRSEKV 136 

F ++4K KGYYI++++KV 
Sbjct: 181 FDEVS WQKKKGYYI I KAKKV 201 

A related DNA sequence was identified in S.pyogenes <SEQ ID 4687> which encodes the amino acid 
sequence <SEQ ID 4688>. Analysis of this protein sequence reveals the following: 

Possible site: 49 

>>> Seems to have no N-tenninal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 4232 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 139/195 (71%), Positives = 165/195 (84%) 

Query: 1 hWJMYYTFJWPNVEHDIHELNVKLLGESFSFTiTDAGVFSKRMIDYGSQVLLNSLHFEKNKS 60 

M MYY ENP+ HDIHE+ V+LL F+FLTD+GVFSK+M+D+GSQVLL +L+F++N+ 
Sbjct: 12 MTKJmDENPDSLHDIHEVKVELIOTPFTFL^ 71 

Query: 61 LLDLGCGYGPLGISIiAKVQ^WKATMVDINTRALELAKKNATRNGVVVEVFQSNIYENISK 120 

+LDLGCGYGPLGISLAKVQ V AT+VDIN RAL+LA+KNAT N V V +FQSNIYENIS 
Sbjct: 72 VLDIiSCGYGPLGISIAKVQRVDATLVDINNRALDLARKmTNMQVAVTIFQSNIYENISG 131 

Query: 121 TFDYIISNPPIRAGKQVTOSIIEESICYIIOT^SLTIVIQKKC^APSAKAKMLDTFGNCD 180 

F++IISNPPIRAGK+WHSIIE+SI +L G LTIVIQKKQGAPSAKAKM FGN + 
Sbjct: 132 HFEHIISNPPIRAGKRVVHSIIEKSIDFLVWGDLTIVIQKKQGAPSAKAKMATIFGNVE 191 

Query: 181 ILKKDKGYYILRSEK 195 

IL+KDKGYY+LRS K 
Sbjct: 192 ILRKDKGYYVLRSIK 206 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1524 

A DNA sequence (GBSxl615) was identified in S.agalactiae <SEQ ID 4689> which encodes the amino 
acid sequence <SEQ ID 4690>. This protein is predicted to be pantothenate kinase (coaA). Analysis of this 
protein sequence reveals the following: 



Final Results 

bacterial cytoplasm Certainty=0. 5021 (Affirmative) < succ; 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAB06594 GB:AP001516 pantothenate kinase [Bacillus halodurans] 
Identities = 140/307 (45%), Positives = 203/307 (65%), Gaps = 5/307 (1%) 
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Query: 4 EPINPDRISRENWKDLHQQSQALLTEKELESIKSLNDNINIQDVIDIYLPLIHLIQIYKR 63 
+F + +SR WK L + S + E+ELE + LN+ I + +V DIY+PL L+ +4- 
^ Sbjct: 8 DFFPYTVLSRSQWKSLRKASSLPIlffiQELEQLVGI^PITIJIEVMIWPLAEIiLHVHAT 67 

Query: 64 SQENLSFSKAIFLKKENYQRPFIIGISGSVAVGKSTTSRLLQLLISRTFKDSHVELVTTD 123 

+ + L K F 4- PFIIG++GSVAVGKSTT+RLLQ L4- + HV+LVTTD 

Sbjct: 68 AYQRLQQQKRGFFHHGKNRSPFIIGLaGSVAVGKSTTARLLQKLLKAWPEHHHVDLVTTD 127 

10 Query: 124 GFLYPNEKLIQNGHjNRKGFPESYDMESLIMFLDTIKNGIT-AKIPIYSHEIYDIVPNQL 182 

GFLYPNE L G++++KGFPESYD+ +L+ FL 4-K G K P+YSH Y+IV 
Sbjct: 128 GFLYPNETLEARGLMDKKGFPESYDLPALIRFLSDVKRGEPYVKAPVYSHLTYHIVEGDY 187 



Query: 183 QT1ETPDFLILEGINVFQ-NQQNHRL YMNDYFDFSIYIDAENKQIEEWYLQRFNSLL 238 

15 Q + PD +I+EGINV Q N4-+NH + +++D+FDFSIY+DA+ +QI +WY++RF L 

Sbjct: 188 QVVHEPDIVIVEGI1WLQWKRNHHIPNVFVSDFFDFSIYVDAKEEQILQWYIERFKLLQ 247 

Query: 239 QIAEADPSNYYHKFTQIPPHKAMELAKDIWKTINL\^EKYIEPTRNRADFIIHKGKHHK 298 
A DP++Y+H+F + +A + A IWK IN VNL 4 I PT++RAD ++ KG HH 
20 Sbjct: 248 NTAFQDPNSYFHRFRHLSEVFAEQFATSIWKNINGVNLHENILPTKHRADLVLQKGPHHF 307 

Query: 299 1DEIYLK 305 

IDE4- L+ 
Sbjct: 308 IDEVKLR 314 

25 

A related DNA sequence was identified in S.pyogenes <SEQ ID 469 1> which encodes the amino acid 
sequence <SEQ ID 4692>. Analysis of this protein sequence reveals the following: 

Possible site: 28 

»> Seems to have no N-terminal signal sequence 

30 

Final Results 

bacterial cytoplasm --- Certainty=0. 4790 (Affirmative) < suco 
bacterial membrane --- Certainty=0. 0000 (Not Clear) < suco 
bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

35 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 219/306 (71%), Positives = 269/306 (87%) 



Query: 1 MNlNffiFINFDRISREISMKDLHQQSQALIjTEKELESIKSLNDNINIQDVIDIYLPLINLIQI SO 
40 M+NEFINF++ISRE+WK LHQ4-++ALLT+4-EL4-SI SLNDNI + I DVIDIYLPLINLIQ+ 

Sbjct: 1 MSNEFINFEKISFtESWKTLHQKAKALLTQEELKSITSIiNDNISINDVIDIYLPLINLIQV 60 

Query: 61 YKRSQENLSFSKAIFLKKENYQRPFIIGISGSVAVGKSTTSRLLQLLISRTFKDSHVELV 120 
YK +QENLSFSK++FLKK+ RPFI IGISGSVAVGKSTTSRLLQIili+SRT +S VELV 
45 Sbjct: 61 YKIAQENLSFSKSLFLKKDIQLRPFIIGISGSVAVGKSTTSRLLQLLLSRTHPNSQVELV 120 

, Query: 121 TTDGFLYPNEKLIQNGILNRKGFPESYDMESLLNFLDTIKNGITAKIPIYSHEIYDIVPN 180 
TTDGFLYPN+ LI+ G+LNRKGFPESY+ME LL+FLD+IKNG TA P+-YSH+IYDI+PN 
Sbjct: 121 TTDGFLYPNQFLIECGLLNRKGFPESYNMELLLDFLDSIKNGQTAFAPVYSHDIYDIIPN 180 

50 

Query: 181 QLQTIETPDFLILEGINVFQNQQNHRLYMNDYFDFSIYIDAENKQIEEWYLQRFNSLLQL 240 

Q Q+ PDFLI+EGINVFQNQQN+RLYM+DYFDFSIYIDA++ IE WY++RF S+L4-L 
Sbjct: 181 QKQSFNNPDFLIVEGINVFQNQQNNRLYMSDYFDFSIYIDADSSHIETWYIERFLSILKL 240 

55 Query: 241 AFADPSNYYHKFTQIPPHKAMEIAKDIWKTINLVNLEKYIEPTRNRADFIIHKGKHHKID 300 

A+ DP NYY 4-4- Q+P +A+ A+++WKT+NL NLEK+IEPTRNRA+ I+HK HKID 
Sbjct: 241 AKRDPHNYYAQYAQLPRSEAIAFARN-i^iOV^ENLEKFIEPTRNRAELILHKBADHKID 300 

Query: 301 EIYLKK 306 
60 EIYLKK 

Sbjct: 301 EIYLKK 306 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 
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Example 1525 

A DNA sequence (GBSxl616) was identified in S.agalactiae <SEQ ID 4693> which encodes the amino 
acid sequence <SEQ ID 4694>. Analysis of this protein sequence reveals the following: 

Possible site: 59 
5 >>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .3866 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

10 bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAB05058 GB:AP001511 ribosomal protein S20 (BS20) [Bacillus halodurans] 
Identities = 47/86 (54%) , Positives = 59/86 (67%) , Gaps = 7/86 (8%) 

15 

Query: 3 VKTLANIKSAIKRAELNVICQNEKNSAQKSAMRTAIKAFEA NPSEELYRA ASSS 55 

+K ANIKSAIKR + N K+ +N++ KSA+RTAIK FEA N E +A A+ 
Sbjct: 1 MKGNANI KSAI KRVKTNEKRRIQNASVKSALRTAI KQFEAKVENNDAEAAKAAFVEATKK 60 

20 Query: 56 IDKAASKGLIHTNKASRDKARLATKL 81 

+DKAA+KGLIH N ASR K+RLA KL 
Sbjct: 61 LDKAANKGL 1 HKNA&SRQKSRLAKKL 86 

A related DNA sequence was identified in S.pyogenes <SEQ ID 4695> which encodes the amino acid 
25 sequence <SEQ ID 4696>. Analysis of this protein sequence reveals the following: 

Possible site: 59 

>» Seems to have no N-terminal signal sequence 

Final Results 

30 bacterial cytoplasm Certainty=0 .3872 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

35 Identities = 76/82 (92%) , Positives = 78/82 (94%) 

Query: 1 MEVKTLANIKSAIKRAELNVKQNEKNSAQKSAMRTAIKAFEANPSEELYRAASSSIDKAA 60 

+EVKTLANIKSAIKRAELNVK NEKNSAQKSAMRTAIKAFEANPSEEL+RAASSSIDKA 
Sbjct: 1 LEWTIANIKSAIKRAELNV1<ANEKNSACKSM-1RTAIKAFEANPSEELFRAASSSIDKAE 60 

40 

Query: 61 SKGLIHTNKASRDKARLATKLG 82 

SKGLIH NKASRDKARLA KLG 
Sbjct: 61 SKGLIHKNKASRDKARLAAKLG 82 

45 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1526 

A DNA sequence (GBSxl617) was identified in S.agalactiae <SEQ ID 4697> which encodes the amino 
acid sequence <SEQ ID 469 8>. Analysis of this protein sequence reveals the following: 

50 Possible site: 48 

>» Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood =-10.99 Transmembrane 31 - 47 ( 25 - 51) 

Final Results 

55 bacterial membrane --- Certainty=0. 5394 (Affirmative) < suco 
bacterial outside Certainty=0. 0000 (Not Clear) < suco 
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bacterial cytoplasm --- Certainty=0. 0000 (Not Clear) < suco 
The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC35851 GB:AF086736 amino acid-binding protein Abp 
[Streptococcus uteris] 
Identities = 169/269 (62%) , Positives = 203/269 (74%) , Gaps = 2/269 (0%) 
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PYM N QV+VTK SS+I S MKGK LGAQSGSSG+DAF + P +LK VK +A QY+ 
PYMKNEQVLVTKKSSNITSFAAMKGK^GAQSGSSGYDAFTSNPKVLKDIVKIJNDATQYE 181 

TFTQALIDLKNmiDGLLIDEWANYYLKQEGNIKAYYFVKTAYQGENFWGAEKVDRRL 268 
TF QA IDLKN+RIDGLLID+VYANYYLKQEG + Y VK+ + GE+F VG RK D+ L 
TFIQAFIDLKNDRIDGLLIDKVYANYYLKQEGELMYNIVKSEFDGEDFAVGVRKEDKIL 241 

I EKINKAFKQLHNKGRFQKI SYKWFGEDV 297 
++ IN AF +L+ G+FQ+IS KWFGEDV 
LKNINSAFTKLYKTGKFQEISQKWFGEDV 270 

A related DNA sequence was identified in S.pyogenes <SEQ ID 4699> which encodes the amino acid 
sequence <SEQ ID 4700>. Analysis of this protein sequence reveals the following: 

Possible site: 21 

»> May be a lipoprotein 

Final Results 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: • 

>GP:AAC35851 GB:AF086736 amino acid-binding protein Abp 
[Streptococcus uberis] 
Identities - 176/277 (63%), Positives = 220/277 (78%), Gaps = 1/277 (0%) 

Query: 1 MIIKKRTVAILAIASSFFLVACQATKSLKSGDAWGVYQKQKSITVGFDNTFVPMGYKDES 60 

M +KK + LA+AS+ FLVAC + + K+ D W Y+K+KSIT+GFDNTFVPMG+KDES 
Sbjct: 1 MNLKKILLTTLALASTLFLVACGKSSAaKT-DQWDTYKKEKSITLGFDNTFVPMGFKDES 59 

Query: 61 GRCKGFDIDLAKEVFHQYGLKVNFQAINWDMKEAELNNGKIDVIWNGYSITKERQDKVAF 120 

G+ GFD++LAK VF +YG+KV FQ INWD+KE EL NGKID+IWNGYS+TKERQ KVAF 
Sbjct: 60 GKNTGFDVEIAKAVFQEYGIKVKFQPINWDLKETELKNGKIDMIWNGYSVTKERQAKVAF 119 

Query: 121 TDSYMRNEQIIWKKRSDIKTISDMKHKVLGAQSASSGYDSLLRTPKLLKDFIKNKDANQ 180 

+ YM+NEQ++V KK S+I + + MK KVLGAQS SSGYD+ PK+LKD +K+ DA Q 
Sbjct: 120 STPYMKNEQVLWKKSSNITSFAAMKGKVI^AQSGSSGYDAFTSNPKVLKDIVKDNDATQ 179 

Query: 181 YETFTQAFIDLKSDRIDGILIDKVYANYYLAKEGQLENYRMIPTTFENEAFSVGLRKEDK 240 

YETF QAFIDLK+DRIDG+LIDKVYANYYIj +EG+L NY ++ + F+ E F+VG+RKEDK 
Sbjct: 180 YETFIQAFIDLKNDRIDGLLIDKVYANYYLKQEGELTNYNIVKSEFDGEDFAVGVRKEDK 239 
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An alignment of the GAS and GBS proteins is shown below. 

Identities = 151/266 (56%), Positives = 189/266 (70%) , Gaps = 4/266 (1%) 

Query: 32 LLTIIFGLFMIILSACGMSNKEMAGIIMWEHYQKEKKITIGFDNTFVPMGFESRSGDYTG 91 

+L I F++ AC + K + D W YQK+K IT+GFDNTFVPMG++ SG G 
Sbjct: 10 ILAIASSFFLV AC-QATKSLKSGDAWGVYQKQKSITVGFDNTFVPMGYKDESGRCKG 65 

Query: 92 FDIDl^AWKEYGISVKWQPINTOMKETELKNGNI^^ 151 

FDIDLA VF +YG+ V +Q INWDMKE ELNNG ID+IWNGYS T ER KVAFT+ YM 
Sbjct: 66 FDIDLAKEWHQYGLKOTFQAINWDMKEAEI^IGKIDVIWNGYSITKERQDKVAFTDSYM 125 

Query: 152 NNHQVIVTKTSSHINSIKDMKGKKLGAQSGSSGFDAFNAKPDILKKFVKGKEAVQYDTFT 211 

N Q+IV K S I +1 DMK K LGAQS SSG+D+ P +LK F+K K+A QY+TFT 
Sbjct: 126 RMEQIIWKKHSDIKTISDMKHKVLGAQSASSGYDSLLRTPKLLKDFIKNKDANQYETFT 185 

Query: 212 QALIDLKNNRIDGLLIDEVYANYYLKQEGNIKAYYFVKTAYQGENFWGARKVDRRLIEK 271 

OA IDLK++RIDG+LID+VYANYYL +EG ++ Y + T ++ E F VG RK D+ L K 
Sbjct: 186 QAFIDLKSDRIDGILIDKVYANYYLAKEGQLENYRMIPTTFENFAFSVGLRKEDKTLQAK 245 

Query: 272 INKAFKQLHNKGRFQKI S YKWFGEDV 297 

IN+AF+ L+ G+FQ IS KWFG+DV 
Sbjct: 246 INRAFRVLYQNGKFOAI SEKWFGDDV 271 



A related GBS gene <SEQ ID 8833> and protein <SEQ ID 8834> were also identified. Analysis of this 

25 protein sequence reveals the following: 

Lipop Possible site: 22 Crend: 4 

Sequence Pattern: CGMS 
SRCFLG: 0 

McG: Length of UR: 22 
30 Peak Value of UR: 3.05 

Net Charge of CR: 2 
McG: Discrim Score: 11.16 
GvH: Signal Score (-7.5): -1.96 
Possible site: 24 
35 »> May be a lipoprotein 

Amino Acid Composition: calculated from 23 
ALOM program count: 0 value: 8.96 threshold: 0.0 
PERIPHERAL Likelihood =8.96 68 
modified ALOM score: -2.29 

40 

*** Reasoning Step: 3 



Final Results 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

45 bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 .0000 (Not Clear) < suco 



The protein has homology with the following sequences in the 

62.2/75.8% over 270aa 

Streptococcus uberis 

GP| 3603430 | amino acid-binding protein Abp Insert characterized 
ORF00904(385 - 1203 of 1503) 

GP|3603430|gb|AAC35851.l| |AF086736(4 - 274 of 277) amino acid-binding protein Abp 
{Streptococcus uberis} 
%Match =34.8 

%Identity =62.1 %Similarity =75.7 

Matches = 169 Mismatches = 65 Conservative Sub.s = 37 



60 153 183 213 243 273 303 333 363 

FHYLGGKSNVSH*LW**LIHRLLVT^SQIjALLIQSCVKK*KN*FYKIE 

393 423 453 483 513 543 573 603 

GGRLLTHKNILLTIIFGLFMIILSACGMSNKEMAGIDNWEHYQKEKKITIGFDNTFVPMGFESRSGDYTGFDIDLANAVF 
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: I llll : : I III 1 = I I 1= 1 = 111 I I = I I I I I I I I I I I = II llll==ll III 

MNffiKKILLTTLAIASTLFLVACGKSS--AAKTD^ 



laSYGISVKWQPINVTOMKETELNNGNIDLIWNGYSKTO 

■-Mil ||:||||||:||||| II 11=111111 I II Mil-- III I ll.-lll 11 = 1 I llll llllll 
QEYGIKVKFQPINWDLKETELKNGKIDMIWGYSOT 

90 100 110 120 130 140 150 

873 903 933 963 993 1023 1053 1083 

SSGFDAFNAKPDILKKFVKGKEAVQYDTFTQ^IDLKNITOIDGLLIDEVYANYYLKQEGNIKAYYFATKTAYQGENFWGA 
111 = 111 = I :|| II :| 11 = 11 I I = I I I I 1 = I I I I I I I I = I I I I I I I I I I I = I 11= = 11 = 1 II 
SSGYDAFTSNPKVLKDIVKDlSIDATQYETFIGAFIDLKiroRIDGLLI^ 

170 180 190 200 210 220 230 

1113 1143 1173 1203 1233 

RKVDRRLIEKINKAFKQLHNKGRFQKISYKWFGEDVYSKE*KTRNI 
II 1= h= II II =1= 1 = 11 = 11 lllllll == I 



SEQ ID 8834 (GBS225) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 49 (lane 10; MW 32kDa). The GBS225-His fusion product was purified (Figure 
205, lane 7) and used to immunise mice. The resulting antiserum was used for FACS (Figure 266), which 
confirmed that the protein is immunoaccessible on GBS bacteria. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1527 

A DNA sequence (GBSxl618) was identified in S.agalactiae <SEQ ID 470 1> which encodes the amino 
acid sequence <SEQ ID 4702>. This protein is predicted to be arginine ABC transporter, ATP-binding 
protein (glnQ). Analysis of this protein sequence reveals the following: 

o N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0 .3229 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAB49429 GB:U73111 glutamine transport ATP-binding protein GLNQ 
[Salmonella typhimurium] 
Identities = 94/210 (44%) , Positives = 146/210 (68%) , Gaps = 3/210 (1%) 

Query: 1 MLELKNISKCYGQKEIFKDFNLTVEEGKILSLVGPSGGGKTTLLRMLAGLEKIDSGTIVH 60 

M+E KN+SK +G ++ + +L + +G+++ ++GPSG GK+TLLR + LE+I SG ++ 
Sbjct: 1 MIEFKNVSKHFGPTQVLHNIDLNIRQGEVWIIGPSGSGKSTLLRCINKLEEITSGDLIV 60 

Query: 61 DGKEVS VDHLETIiTOjLGFVFQDFQLFPHLTVIiD^ILSPVICTMGLSKEIjyCEKALVL 117 

DG +V+ VD G VFQ F LFPHLT L+N++ P++ G+ KE A+++A L 

Sbjct: 61 DGLKVlTOPKVDERLIRQEAGMVFQQFYLFPHLTALENVMFGPLRVRGVKKEEAEKQAKAIi 120 

Query: 118 LERLGLKDHALVYPFSLSGGQKQRVALARAMMIDPQIIGYDEPTSALDPELRQEVEKLIL 177 

L ++GL + A YP LSGGQ+QRVA+ARA+ + P+++ +DEPTSALDPELR EV K++ 
Sbjct: 121 LAKVGLAEFJ\HHYPSELSGGQQQRVAIARAIAVKPI0>1MLFDEPTSALDPELRHEVLKVMQ 180 

Query: 178 QNRETGMTQ1WTHDLQFAESISDTILKIN 207 
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E GMT ++VTH++ FAE ++ ++ 1+ 
Sbjat: 181 DLAEEGMTMVIVTHEIGFAEKVASRLIFID 210 

A related DNA sequence was identified in S.pyogenes <SEQ ID 4703> which encodes the amino acid 
5 sequence <SEQ ID 4704>. Analysis of this protein sequence reveals the following: 

Possible site: 39 

>>> Seems to have no N- terminal signal sequence 

Final Results 

10 bacterial cytoplasm Certainty=0 .2146 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

15 Identities = 164/209 (78%), Positives = 183/209 (87%) 





1 


MLELKNISKCYGQKEIFKDFNLTVEEGKILSLVGPSGGGKTTLLRMLAGLEKIDSGTIVH 


60 






MLELRNISK +GQK IF FNLTV++G++I.SLVGPS GGKTTLLRMLAGLE IDSG + + 




Sbjct: 


1 


MLELKNISKQFGQKTIFDGFNLTVQDGEVLSLVGPSSGGKTTLLRMLAGLESIDSGQVFY 


60 




61 


DGKEVSVDHLETLNLLGFVFQDFQLFPHLTVIiDNLILSPVKTMGLSKELAKEKALVLLER 


120 






+G++V +DHLE NLLGFVFQDFQLFPHLTVLDNL LSP TMG K AKEKAL LL R 




Sbjct: 


61 


NGEDVGIDHLENRNLLGFVFQDFQLFPHLTVLDNLTLSPTITMGKQKADAKEKALDLLAR 


120 




121 


LGLKDHALWPFSLSGGQKQRVALARflMMIDPQIIGYDEPTSALDPELRQEVEKLILQNR 


180 






LGLK+HA VYP+SLSGGQKQRVALARAMMIDPQIIGYDEPTSALDPELRQ VE LI+QNR 




Sbjct: 


121 


LGLKEHAQWPYSLSGGQKQRVALARAMMIDPQIIGYDEPTSALDPELRQTVEALIVQNR 180 




181 


ETGMTQIWTHDLQFAESISDTILKINPK 2 09 








E G+TQIWTHDL FAE+ISD I+++MPK 




Sbjct: 


181 


EMGITQIWTHDLVFAEAISDRIIRVNPK 209 





Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

35 Example 1528 

A DNA sequence (GBSxl619) was identified in S.agalactiae <SEQ ID 4705> which encodes the amino 
acid sequence <SEQ ID 4706>. This protein is predicted to be amino acid ABC transporter, permease 
protein (glnP). Analysis of this protein sequence reveals the following: 

Possible site: IS 
40 >>> Seems to have a cleavable N-term signal seq. 

INTEGRAL Likelihood = -8.12 Transmembrane 102 - 118 ( 96 - 120) 

Final Results 

bacterial membrane Certainty=0 .4248 (Affirmative) < suco 

45 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9341> which encodes amino acid sequence <SEQ ID 9342> 
was also identified. 

50 The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAA98402 GB:AP002545 ABC amino acid transporter permease 
[Chlamydophila pneumoniae J138] 
Identities = 55/127 (43%), Positives = 83/127 (65%), Gaps = 1/127 (0%) 

55 Query: 3 AAIIAFTI^AAYFAEIFRGGIESIPKGQYEAMCVLKFSKFQTVWYIVLPQVFKIVLPSV 62 

A IIA +MN AAY AE RGGI S+ GQ+E+A VL + K+Q YI+ PQVFK +LPS+ 
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Sbjct: 89 AGIIALSMNSAAYLaENIRGGINSLSIGQl-rESAIT/LGYKKYQIFVYIIYPQVFKNILPSL 148 

Query: 63 PNETITLVKDSSLVYILGVGDLLLESKTAAMRDATIAPMF-IAGGIYLLLIGLLTILSKQ 121 

NE ++L+K+SS++ ++GV +L +K +R+ M+ I G+Y L+ + +S+ 

Sbjct: 149 T^FVSLIKESSILMWGVPELTKVTKDIVSRELNPMEMYLICAGLYFLMTSSFSCISRL 208 

Query: 122 VEKRFNY 128 

EKR +Y 
Sbjct: 209 SEKRRSY 215 

A related DNA sequence was identified in S.pyogenes <SEQ ID 4707> which encodes the amino acid 
sequence <SEQ ID 4708>. Analysis of tins protein sequence reveals the following: 

Possible _site: 34 
>>> Seems to have no N- terminal signal sequence 

INTEGRAL Likelihood =-11.57 Transmembrane 21 - 37 ( 7 - 44) 
INTEGRAL Likelihood =-10.93 Transmembrane 185 - 201 ( 178 - 206) 
INTEGRAL Likelihood = -3.29 Transmembrane 63 - 79 ( 62 - 81) 

Final Results 

bacterial membrane --- Certainty=0 . 5628 (Affirmative) < suco 

bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 



Sbj. 

Query: 

Sbj. 

Sbj 

Query: 
Sbj. 



4 IQQVLPSLLDGALVTLQVFFIVIILSIPLGAIEAFLKKIPFKPLQWFLTLYVWMMRGTPL 63 

IQ +P +L+G VTLQ + ++ + LG +LA ++ +WF Y 4 RGTPL 

8 IQPFMPFMLEGVWVTLQFVSVSLLFGLVLGIVTAIFKISKYRLFRWFADFYTSIFRGTPL 67 

64 LLQLI FFYYVLPS VG I S FDRMPAAILAFTLNYAAYFAEI FRGGIEAI PKGQYEAAKVLKL 123 

+LQL+ Y LP G+ + AA LAF LN AAY +EI R GI+A+ KGQ EAA+ L + 
68 ILQLLMIYLALPQFGVDISQFQAAFLAFGLNSAAYVSEIIRAGIQAVDKGQREAAEALGI 127 

124 KPLQTIRYIILPQVFKIVT^PSVFNEVINLVKDSSLVYVLGVGDLL-IASKTAANRDATLA 182 

4- IILPQ + +LP++FNE INL K+S++V V+GV DL+ A T+A L 
128 PYRPMMLRIILPQAMRNILPALFNEFINLTKESaiVSVIGVTDLMRRAQITSAETYLYLE 187 

183 PMFIAGLIYLLLIGLVTIISKQVEKR 208 

P+ GLIY +L+ +T+I + +E+R 
188 PLLFVGLIYYVLVMGLTVIGRLLERR 213 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 112/130 (86%) , Positives = 121/130 (92%) 

Query: 1 MPAAI IAFTMNYAAYFAEIFRGGIESIPKGQYEAAK\fLKFSKFQTVWYIVLPQVFKIVLP 60 

MPAAI +AFT+NYAAYFAEIFRGGIE+ 1 PKGQYEAAKVLK QT+ YI+LPQVFKIVLP 
Sbjct: 84 MPAAILAFTIJ^AAYFAEIFRGGIEAIPKGQYFAAKVLKLKPLQTIRYIILPQVFKIVLP 143 

Query: 61 SVFICETITLVKDSSLVYILGVGDLLLESKTAANRDATIAPMFIAGGIYLLLIGLLTILSK 120 

SVFNE I LVKDSSLVY+LGVGDLLL SKTAANRDATLAPMFIAG IYLLLIGL+TI+SK 
Sbjct: 144 SVFNEVIKLVia)SSLvYvt.GVGDLLIASKTAANRDATLAPMFIAGLIYLLLIGLVTIISK 203 

Query: 121 QVEKRFNYYK 130 

QVEKRFNYY+ 
Sbjct: 204 QVEKRFNYYQ 213 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens f 
vaccines or diagnostics. 



WO 02/34771 



PCT/GB01/04789 



-1698- 

Example 1529 

A DNA sequence (GBSxl620) was identified in S.agalactiae <SEQ ID 4709> which encodes the amino 
acid sequence <SEQ ID 471 0>. This protein is predicted to be minidiscs. Analysis of this protein sequence 
reveals the following: 

5 Possible site: 61 



10 





have an uncleavable N- 


term signal seg 








INTEGRAL 


Likelihood = -9.66 


Transmembrane 44 - 


- 60 


( 39 • 


- 66) 


INTEGRAL 


Likelihood = -7.96 


Transmembrane 129 ■ 


- 145 


( 123 - 


- 147) 


INTEGRAL 


Likelihood = -5.15 


Transmembrane 13 • 


■ 29 


( 9 ■ 


- 33) 


INTEGRAL 


Likelihood = -2.39 


Transmembrane 94 - 


- 110 


( 94 - 


- 110) 



Final Results 

bacterial membrane --- Certainty=0 .4864 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

15 bacterial cytoplasm Certainty=0.0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database. 



Query: 7 IKQTYGLMTTIAMIVGWIGSGIYFKVDDILKFTGGDVFLGMVILVLGSFSIVFGSLSIS 66 

+K+ GL+ +A+IVGV++GSGI + +LKF+ G + +++ VL + G+L + 

Sbjct: 39 LKKQIGLLDGVAIIVGVIVGSGIFVSPKGVLKFS-GSIGQSLIVWVLSGVLSMVGALCYA 97 

25 Query: 67 ELAIRTSESGGIFSYYEKYVSPALAATLGLFASFLYL-PTLTAIVSWVAAFYTLGE 121 

EL +SGG ++Y P L A L L+ + L L PT AI + A Y L 

Sbjct: 98 ELGTMIPKSGGDYAYIGTAFGP-LPAFLYLWVALLILVPTGNAITALTFAIYLLKPFMPS 156 

Query: 122 -SSSLESQIILAAVYILALSLMNIF 145 
30 + +E+ +LAA I L+L+N + 

Sbjct: 157 CDAPIEAVQLLAAAMICVLTLINCY 181 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
35 vaccines or diagnostics. 

Example 1530 

A DNA sequence (GBSxl621) was identified in S.agalactiae <SEQ ID 471 1> which encodes the amino 
acid sequence <SEQ ID 4712>. Analysis of this protein sequence reveals the following: 



Final Results 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

45 bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
50 vaccines or diagnostics. 
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Example 1531 

A DNA sequence (GBSxl622) was identified in S.agalactiae <SEQ ID 4713> which encodes the amino 
acid sequence <SEQ ID 4714>. This protein is predicted to be TRK potassium uptake system protein. 
Analysis of this protein sequence reveals the following: 

5 Possible site: 27 

>» Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -0.06 Transmembrane 232 - 248 ( 232 - 248) 



Final Results 

10 bacterial membrane Certainty=0. 1022 (Affirmative) < succ: 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm — Certainty=0 . 0000 (Not Clear) < suco 



A, related GBS nucleic acid sequence <SEQ ID 8835> which encodes amino acid sequence <SEQ ID 8836> 
was also identified. Analysis of this protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 5 



GvH 



Signal Score (-7.5): -3.64 
Possible site: 27 
•>> Seems to have no N-terminal signal sequence 
lLQM program count: 1 value: -0.06 threshold: 
INTEGRAL Likelihood = -0.06 
PERIPHERAL Likelihood = 1.27 
modified ALOM score: 0.51 

j ** Reasoning Step: 3 



228 - 244 ( 228 • 



■ Final Results 

bacterial membrane - 
bacterial outside - 
bacterial cytoplasm - 



- Certainty=0 . 1022 (Affirmative) • 

- Certainty=0. 0000 (Not Clear) < i 

- Certainty=0. 0000 (Not Clear) < i 



The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAB90401 GB:AE001046 TRK potassium uptake system protein 
(trkA-2) [Archaeoglobus fulgidus] 
Identities' = 136/446 (30%) , Positives = 238/446 (52%) ,. Gaps = 12/446 (2%) 

Query: 5 MRIIWGGGKVGTALCRSLVAEKHDVVLIEKKENvLKRVTKQHDIMGIVGNGANYKILEQ 64 

MRI++ G G+VG L SL A HDV++IEK + +RV++ D++ I GN AN K+L 
Sbjct: 1 MRIVIAGAGEVGYHLAMSL-APNHDVIIIEKDVSRFERVSEL-DVVAINGNAANMKVLRD 58 

Query: 65 AEVKNCDIFIAITDRDEVNMISAVLAKI^GAKETvVRMRNPEYSNPYFKDKNFLGFSSW 124 

A V+ D+F+A+T DEVN++S + AKK+GAK +VR+ NPEY + ++ LG+ ++ 

Sbjct: 59 AGVERADVFLAVTGNDEWLLSGLAAKKVGAKNVIWVENPEYVDRPIVKEHPLGYDVLI 118 

Query: 125 NPELLAAQYIANTIEFPNATSVEHFANGRVMLMEFKILEGNKLCHTSMSQIRKKFGNIVI 184 

P+L AQ A I P A V F+ G+V ++E +++EG+K +++ + N+VI 
Sbjct: 119 CPQLSLAQEAARLIGIPGAIEVVTFSGGKVEMIELQVMEGSKADGKAIADLYLP-QNWI 177 

Query: 185 CAIERDGKLIIPDGDATIQ\nCDKIFVTGNRIEMILFHNYVKNKV\nQ^LWIGAGRIAYYL 244 

+ 1 R+G + IP GD ++ D++ + ++ + V + + + GAG I Y 

Sbjct: 17,8 ASIYRNGHIEIPRGDTVLRAGDRVAIVSKTED\'EMLKGIFGPPVTRRVTIFGAGTIGSYT 237 

Query: 245 LNILK^^IOTHVKLVEIas^QEQAEYFSQEFPNVPVVHGDGTAKNILLEESVTSFDAVATLTG 304 

IL T VKL+E + E+ E S E V +V GD T L+EE + DAV T 
Sbjct: 238 AKIIAKGMTSvKLIESSMERCEALSGELEGVRIVCGDATDIEFLIEEEIGKSDAVLAATE 297 

Query: 305 VDEENIITSMFLESIGIPKNITKVNRTSLLEIIDDKQLSSIITPKRIAVDHVMHFVRGRV 364 

DE+N++ S+ +++G I KV + +++ + + + P+ + + V +R 
Sbjct: 298 SDEKNLLISLLSKm-GARIAIAKVEKKEYVKLFEAVGVDVALNPRSVTYNEVSKLLR--- 354 



Query: 365 NAQDS^F^HHIANDRIETLQFEIKETSKLAWRSLASLKLKQNILIAAIIRNNKTIFPT 424 
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Sbjct: 355 TMRIETLAEIEGTAWEV- - 

Query: 425 GEDVLTVGDRIWITLLKNITRTSDM 450 

G+ + DR++V I + 4-+ 

Sbjct: 409 GDTTIEYEDRLLVFAKWDEIEKIEEI 434 
Identities = 48/212 (22%), Positives = 99/212 (46%), Gaps = 15/212 (7%) 

Query: 3 VKMRIIVVGGGKVGTALCRSLVAEKHDVVLIEKKENVLKRVTKQHDIMGIV-GNGANYKI 61 

V R+ + G G +G+ + L V LIE +++ + + + IV G+ + + 

Sbjct: 221 VTRRVTIFGAGTIGSYTAKILAICGMTSVKLIESSMERCEALSGELEGVRIVCGDATDIEP 280 

Query: 62 LEQAEVKNCDIFIAITDRDEVimiSAVIAKWlGAKETOv^MRNPEYSNPYFKDKNFLGFS 121 

L + E+ D +A T+ DE N++ ++L+K +GA+ + ++ EY + +G 
Sbjct: 281 LIEEEIGKSDAVLAATESDEKI&LISLLSKNLGARIAIAKVEKREYVKLF EAVGVD 336 

Query: 122 SWNPELLAAQYIA NTIEFPNATSVEHFANGRVMLKEFKILEGNKLCHTSMEQIRKK 178 

+NP + ++ T+ +E A V++ +++ G L + + 
Sbjct: 337 VALNPRSVTYNEVSKLLRTMRIETLAEIEGTAVVEVVVRNTRLV-GKALKDLPLPK 391 

Query: 179 FGNIVI CAIERDGKLI I PDGDATIQVKDKI FV 210 

+ +1 AI R + +IP GD TI+ +D++ V 
Sbjct: 392 - - DAI IGAI VRGNECL I PRGDTT IEYEDRLLV 421 

There is also homology to SEQ ID 4716. 

SEQ ID 8836 (GBS384) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 69 (lane 2; MW 53kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 72 (lane 6; MW 78kDa). 

The GBS384-GST fusion product was purified (Figure 212, lane 9) and used to immunise mice. The 
resulting antiserum was used for FACS (Figure 279), which confirmed that the protein is immunoaccessible 
on GBS bacteria. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1S32 

A DNA sequence (GBSxl623) was identified in S.agalactiae <SEQ ID 4717> which encodes the amino 
acid sequence <SEQ ID 4718>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .4948 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 
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Example 1533 

A DNA sequence (GBSxl624) was identified in S.agalactiae <SEQ ID 4719> which encodes the amino 
acid sequence <SEQ ID 4720>. Analysis of this protein sequence reveals the following: 



Possible site: 22 





have an uncleavable N 


term signal sec 










INTEGRAL 


Likelihood = 




58 


Transmembrane 


37 


53 


33 


61) 


INTEGRAL 


Likelihood = 




57 


Transmembrane 




199 


179 


214) 


INTEGRAL 


Likelihood = 


10 


03 


Transmembrane 


397 


413 


392 


424) 


INTEGRAL 


Likelihood = 


-6 


79 


Transmembrane 


14 


30 


5 


31) 


INTEGRAL 


Likelihood = 


-6 


42 


Transmembrane 


71 


87 


69 


93) 


INTEGRAL 


Likelihood = 




99 


Transmembrane 




294 


274 


295) 


INTEGRAL 


Likelihood = 


-4 


19 


Transmembrane 


133 


149 


132 


152) 


INTEGRAL 


Likelihood = 


-4 


09 


Transmembrane 


327 


343 


324 


344) 


INTEGRAL 


Likelihood = 




44 


Transmembrane 


236 


252 


234 


252) 


INTEGRAL 


Likelihood = 


-0 


5S 


Transmembrane 


456 


472 


456 


472) 



- Final Results 

bacterial membrane - 
bacterial outside - 
bacterial cytoplasm - 



- Certainty=0. 6031 (Affirmative) < suco 

- Certainty=0 . 0000 (Not Clear) < suco 

- Certainty=0. 0000 (Not Clear) < suco 



A related GBS nucleic acid sequence <SEQ ID 10065> which encodes amino acid sequence <SEQ ID 
10066> was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAB90400 GB:AE001046 TRK potassium uptake system protein (trkH) 
[Archaeoglobus fulgidus] 
Identities = 166/480 (34%) , Positives = 262/480 (54%) , Gaps = 10/480 (2%) 

^KSMIRFLLSKLLLIEAALLAIPLTVGLIYREP-QSVMMSIVITMIILIILGLLGSLFK 59 
MN + +L KLL++ + +PL ++ EP ++ +++++ +LG G + 
MNLRLTASILGKLLMLFSFSFILPLIAAHVFEEPYHPFLIPAALSLLVGAVLGY-GIKTE 59 



L +LL WRS T IGGMG++V LAI 



TAQILYLLYLLMFAVFAVILYFAGMPFFDSIIIAMGTAGTGGFAVY1TOSIAHYNSPLITN 239 
TA LY +YLL+ +LY G+ FD+I T TGG++ +++SIA + + 

TALSLYKVYLLLTIAEVALLYALGLSLFDAINHTFTTLSTGGYSTHSESIAFFKDVRVEA 236 



I- Y+ +A+A+ +IA 







Sbjct: 


1 




60 


Sbjct: 


60 


Query: 


120 


Sbjct: 


118 


Query: 




Sbjct: 




Query: 


240 


Sbjct: 


237 


Query: 


300 


Sbjct: 


295 


Query: 


350 


Sbjct: 


355 


Query: 


420 


Sbjct: 





F+ +I+TTTGF 



K A +Q+L P V + 



+++IL+ LMFIGGS+GST GG KV+R +L 



WISAAASCFNNIGP- - -LLGSNETFSFFSPFSKLLLSFAMIAGRDEIYPVLLMFIPKTW 476 

ISA A+ N+GP L G+ E +4 F +K+LL+ M GRLEI+ V+ +FIP W 
TSISATAATLGNVGPGLGIAGAAENYASFPSLTKILLAVNMWIGRLEIFTWSLFIPTFW 474 



No corresponding DNA sequence was identified in S.pyogen 
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Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1534 

A DNA sequence (GBSxl625) was identified in S.agalactiae <SEQ ID 472 1> which encodes the amino 
5 acid sequence <SEQ ID 4722>. Analysis of this protein sequence reveals the following: 

Possible site: 22 

»> Seems to have no N-terminal signal sequence (or aa 1-20) 

Final Results 

10 bacterial cytoplasm --- Certainty=0 .2870 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

15 >GP:AAD36530 GB:AE001797 conserved hypothetical protein 

[Thermotoga maritima] 
Identities = 43/75 (57%), Positives = 57/75 (75%), Gaps = l/75 (1%) 

Query: 2 LKSFLIFLWFYQKNISPAFPASCRYRPTCSTYMIEAIQKHG-LKGVLMGIARILRCHPL 60 
20 +K LI L+RFYQ+ ISP P +CR4- PTCS Y I+A++KHG LKG +G+ RILRC+PL 

Sbjct: 1 MKKLLIMLIRFYQRYISPLKPPTCRFTPTCSNYFIQALEKHGLLKGTFLGLRRILRCNPL 60 

' Query: 61 AHGGNDPVPDHFSLR 75 
+ GG DPVP4 FS + 
25 Sbjct: 61 SKGGYDPVPEEFSFK 75 

A related DNA sequence was identified in S.pyogenes <SEQ ID 4723> which encodes the amino acid 

sequence <SEQ ID 4724>. Analysis of this protein sequence reveals the following: 

Possible site: 38 
30 »> Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3639 (Affirmative) < suco 

bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 
35 bacterial outside Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 53/78 (67%) , Positives = 60/78 (75%) 

40 Query: 1 MLKSFLIFLVRFYQKNISPAFPASCRYRPTCSTYMIEAIQKHGLKGVLMGIARILRCHPL 60 

M+K LI V+ YQK ISP P SCRY+PTCS YM+ AI+KHG KG+LMGIARILRCHP 
Sbjct: 1 MMKKLLIVSVKAYQKYISPLSPPSCRYKPTCSAYMLTAIEKKGTKGILMGIARILRCHPF 60 

Query: 61 AHGGNDPVPDHFSLRRNK 78 
45 GG DPVP+ FSL RNK 

Sbjct: 61 VAGGVDPVPEDFSLMRNK 78 

SEQ ID 4722 (GBS233) was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 58 (lane 3; MW 35.6kDa). 
50 The GBS233-GST fusion product was purified (Figure 207, lane 10) and used to immunise mice. The 
resulting antiserum was used for FACS (Figure 280), which confirmed that the protein is immunoaccessible 
on GBS bacteria. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 
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Example 1535 

A DNA sequence (GBSxl626) was identified in S.agalactiae <SEQ ID 4725> which encodes the amino 
acid sequence <SEQ ID 4726>. This protein is predicted to be ribosomal large subunit pseudouridine 
synthase B (rluB). Analysis of this protein sequence reveals the following: 



■ Final Results 

bacterial cytoplasm Certainty=0 .2957 (Affirmative! 

bacterial membrane Certainty=0 . 0000 (Not Clear) ■ 

bacterial outside Certainty=0 . 0000 (Not Clear) ■ 



The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAB05295 GB:AP001512 pseudouridylate synthase [Bacillus halodurans] 
Identities = 130/239 (54%) , Positives = 175/239 (72%) , Gaps = 2/239 (0%) 

RINKYIAHAGIASRRKAEELIKQGMOTINGQVVNEIATQVKAG-DLVEIEGSPIYNEEKV 60 
R+ K IA AGIASRRKAE+LI +G V +NGQW EL +V D +E+EG P+ EE V 
RLQKVI AQAGIASRRKAEQL I LEGKVKVNGQWKELG I KVNPNQDDIEVEGVPVEKEEPV 6 2 

YYLLNKPRGVISSVSDDKGRKTVIDLLPQVKERIYPVGRLDWDTTGLLILTODGDFTDKM 120 
Y+LL KP GVISSV DDKGRK V D L ++++R+YPVGRLD+DT+GLL+LTNDG+F + + 
YFLLYKPTGVI S SVKDDKGRKWTDFL -EI EQRVYPVGRLDYDTSGLLLLTNDGEFANLL 121 

IHPRNEIDKVYLARVKGIATKENLRPLTRGWIDGKKTKPARVTIIKVDHEKNRSVVELT 180 
+HPR+ + 1 +KVY+A+VKGI T++ L+ L RGV ++ T PA+ ++ VD K ++V+LT 
MHPRHKIEKVWAKVKGIPTRDQLKLLARGVKI^DGPTAPAKVKMLSVDRRKQTAIVKLT 181 

IHEGRNHQVKKMFEQVGLLVDKLSRTQFGTLDLTGIJJPGEARRLNKKEISQLHNAA.INK 239 
IHEGRN QV++MFE +G V KL R QF LDL+G+ PSt E L E+ L A+ K 
IHEGRNRQVRRMFETIGCEVMKLKREQFAFLDLSGMNPGDVRPLKP IEVKHLRELAVTK 240 

A related DNA sequence was identified in S. pyogenes <SEQ ID 4727> which encodes the amino acid 
sequence <SEQ ID 4728>. Analysis of this protein sequence reveals the following: 





2 


Sbjct: 


3 






Sbjct: 


63 




121 


Sbjct: 


122 


Query: 


181 


Sbjct: 


182 



srminal signal .< 



Final Results 

bacterial cytoplasm Certainty=0 . 15 B 7 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 210/239 (87%) , Positives = 228/239 (94%) 

Query: 1 I^INKYIAHAGIASRRKAEELIKQG^r^INGQvA/I^EIATQVXAGDLVEIEGSPIYNEEKV 60 

MRINKYIAHAGIASRRKAEELIKQG+VT+NGQV+ +LAT VK+GD+VEIEGSPIYNEEKV 
Sbjct: 9 MRINKYIAHAGIASRRKAEELIKC^LWIjNGQVITDLATTVTCSGDVVEIEGSPIYNEEKV 68 

Query: 61 Yyj^KPRGVISSVSDDKGRKTVIDLLPQWERIYPVGRIXIWDTTGLLILraDGDFTDKiyi 120 

YYLLNKPRG ISSVSDDKGRKTV+DLLPQVKERIYPVGRLDWDT+G+LILTNDGDFTD M 
Sbjct: 69 YYLLMKPRGAISSVSDDKGRKTVLDLLPQVI^RIYPVGRLDWDTSGVIjILTNDGDFTDTM 128 

Query: 121 IHPRNEIDKVYIARVKGIATKENLRPLTRGWIDGKKTKPARYTIIKVDHEKNRSWELT 180 

IHPRNEIDKVYLARVKGIATKENLRPLTRG+VIDGKKTKPARY I++V+ +K+RS+VELT 
Sbjct: 129 IHPRNEIDKVYLARWGIATKENLRPLTRGXVIDGKKTKPARYNIVRvEADKSRSIVELT 188 

• Query: 181 IHEGRNHQVKKMFEQVGLLVDKLSRTQFGTLDLTGLRPGEARRLNKKEISQLHNAAINK 239 
IHEGRNHQVKKMFE VGLLVDKLSRT+FGT+DL GLRPGEARRLNKKEISQLHN a k 
Sbjct: 189 IHEGRIfflQVKKMFESVGLLVDKLSRTRFGTVmK^^ 247 
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Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1536 

A DNA sequence (GBSxl627) was identified in S.agalactiae <SEQ ID 4729> which encodes the amino 
5 acid sequence <SEQ ID 473 0>. Analysis of this protein sequence reveals the following: 
Possible site: 15 

»> Seems to have no N-terminal signal sequence 

Final Results 

10 bacterial cytoplasm --- Certainty=0. 1476 (Affirmative) < suco 

bacterial membrane --- Certainty=0 . 0000 (Hot Clear) < suco 
bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

15 >GP:BAB05280 GB:AP001512 unknown conserved protein [Bacillus halodurans] 

Identities = 75/180 (41%) , Positives = 107/180 (58%) , Gaps = 10/180 (5%) 

Query: 6 SIEALLWAGEDGLSLRQMAELLSLTPSALIQQLEKIAKRYEEDDDSSLLLLETAQTYKL 55 
+IE +LFV G++G++L ++ +LL L+ + LE+L Y D+ L + E A ++L 
20 Sbjct: 9 AIEGILFVRGDEGVTLEELCDLLELSTDWYAALEELRLSYT-DEARGLRIEEVAHAFRL 57 

Query: 66 VTKDSYMTLLRDYAKAPINQSLSRASLEVLSIIAYKQPITRIEIDDIRGVNSSGAITRLI 125 

TK + A + + LS+A+LE L+IIAY+QPITRIE+D++RGV S AI L 

Sbjct: 68 STKPELAPYPKKLAIiSTLQSGljSQftALETIAIIAYRQPITRlEVDETOGvTCSEKAIQTLT 127 

25 

Query: 125 AFGLIKEAGKKEVLGRPNLYETTNYB'LDYMGINQLDDL IDASSIELVDEEVSLF 179 

+ LIKE G+ + GRP LY TT FLD+ G+ L +L ID SSI EE LP 
Sbjct: 128 SRLLIKEVGRAQGTGRPILYGTTPQFLDHFGLKSLKELPPLPEDIDESSI GEEADLF 184 



30 A related DNA sequence was identified in S.pyogenes <SEQ ID 473 1> which encodes the amino acid 
sequence <SEQ ID 4732>. Analysis of this protein sequence reveals the following: 

Possible site: 31 

»> Seems to have no N-terminal signal sequence 

35 Final Results 

bacterial cytoplasm --- Certainty=0. 1062 (Affirmative) < suco 
bacterial membrane — - Certainty=0. 0000 (Not Clear) < suco 
bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 



40 An alignment of the GAS and GBS proteins is shown below. 

Identities = 130/179 (72%) , Positives = 159/179 (88%) 



Query: 1 MTYLGSIEALLFVAGEDGLSLRQKAELLSLTPSALIQQLEKLAKRYEEDDDSSLLLLETA 60 
MTYL IEALLFVAGE+GLSLR +A 4LSLTP+AL QQLEKL+++YE+D SSL L+ETA 
45 Sbjct: 1 MTYLSQIEALLFVAGEEGLSLRHLASMLSLTPTALQQQLEKLSQKYEKDQHSSLCLIETA 60 

Query: 61 QTYKLVTKDSYMTLLRDYAKftPINQSLSRASLSVLSIIAYKQPITRIEIDDIRGVNSSGA 120 

TY+LVTK+ + LLR YAK F+NQSLSRASLEVLSI+AYKQPITRIEIDDIRGVNSSGA 
Sbjct: 61 NTYRLVTKEGFAELLRAYAKTPMNQSLSRASLEVLSIVAYKQPITRIEIDDIRGVNSSGA 120 

50 

Query: 121 ITRLIAFGLIKEAGKKEVLGRPNLYETnJYFLDYMGINQLDDLIDASSIELVDEEVSLF 179 

+++L+AF LI+EAGKK+V+GRP+LY TT+YFLDYMGIN LD+LI+ S++E DEE++LF 
Sbjct: 121 LSKLLAFDLIREAGKKDWGRPHLYATTDYFLDYMGINHLDELIEVSAVEPADEEIALF 179 



55 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 
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Example 1537 

A DNA sequence (GBSxl628) was identified in S.agalactiae <SEQ ID 4733> which encodes the amino 
acid sequence <SEQ ID 4734>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0 . 1012 (Affirmative) < succ: 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 



Query: 3 IKLKDFEGPLDLLLHLVSI<YEVDIYDVPIVEVIEQYLAYIATLQAMRLEVAGEYMLMASQ 62 

+K+ FEGPLDLLLHL+++ E+DIYD+P+ ++ EQYL Y+ T++ + L++A EY++MA+ 
Sbjct: 6 VKIDTFEGPLDLLLHLINRLEIDIYDIPVAKITEQYLLYVHTMRVLELDIASEYLVMAAT 65 

Query: 63 LMLIKSRNLLPK WESNP I - EDDPEMELLSQLEEYRRFKVLSEELANQHQERAKYF 117 

h+ IKSR LLPK 4- E + E+DP EL+ +L EYR++K +++L + +ER K F 
Sbjct: 66 LLS IKSRMLLPKQEEELFEDELLEEEDPREEL I EKL IEYRKYKDAAKDLKEREEERQKS F 125 

Query: 118 SKPKQEVIFEDAILLHDKSVMDLFLTFSQMMSQKQKELSNS QTVIEKEDYRIED 171 

+KP ++ + +S L +T M+ QK L +T I ++D IE 

Sbjct: 126 TKPPSDL - - SEYAKEVKQSEQKLSVTVYDMIGAFQKVLKRKKINRPMETTI TRQDI P IEA 183 

Query: 172 miVIERHFl^KKKTT---LQEVFADCQTKSEMITLFLAMLELIKLHQITVEQDSNFSQV 228 

M I +LK + T ++F + K ++ FLA+LEL+K + +EQ+ NFS + 

Sbjct: 184 RMIEIVH--SLKSRGTRINFmLF-PYEQKEHLVWFIAVLELMKNQLVLIEQEHNFSDI 240 

Query: 229 ILRKEE 234 

Sbjct: 241 YITGSE 246 

A related DNA sequence was identified in S.pyogenes <SEQ ID 4735> which encodes the amino acid 
sequence <SEQ ID 473 6>. Analysis of this protein sequence reveals the following: 

Possible site: 60 
»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -3.61 Transmembrane 199 - 215 ( 199 - 218) 



Final Results 

bacterial membrane Certainty=0. 2444 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:CAB14254 GB:Z99116 similar to hypothetical proteins [Bacillus subtilis] 
Identities = 86/239 (35%) , Positives = 145/239 (59%) , Gaps = 15/239 (6%) 

Query: 3 IKLKDFEGPLDLLLHLVSQYKVDIYEVPIVEVIEQYLNYIETLQVMKLEVAGDYMLMASQ 62 

+K+ FEGPLDLLLHL+++ ++DIY++P+ ++ EQYL Y+ T++V++L++A +Y++MA+ 
Sbjct: 6 VKIDTFEGPLDLLLHLINRLEIDIYDIPVAKITSQYLLYVHTMRVLELDIASEYLVMAAT 55 

Query: 63 LMLIKSRRLLPKWEHI EEDLEQDLL3KIEEYSRFKAVSQALAKQHDQRAKWY 115 

L+ IKSR LLPK E + EED ++L+EK+ EY ++K ++ L ++ ++R K + 

Sbjct: 66 LLSIKSRMLLPKQEEELFEDELLEEEDPREELISKLIEYRKYKDAAKDLKEREEERQKSF 125 

Query: 116 SKPKQELI - FEDAILQEDK TVMDLFLAFSNIMAAKRAVLKNNHTVIERDDYKIEDM 170 

+KP 4-L + + Q ++ TV D+ AF ++ K+ + + T I R D IE 
Sbjct: 126 TKPPSDLSEYAKEWQSEQKLSVTVYDMIGAFQKVLKRKK-INRPMETTITRQDIPIEAR 184 
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Query: 171 MASIKQRLEKENV- IRLSAI FEECQTLNEVI S I FLASLELI KLHWFVEQLSNFGAI IL 228 
MI L+ I +F Q + V++ FLA LEL+K +V +EQ NF I + 

^ Sbjot: 185 MNEIVHSLKSRGTRINFMDLFPYEQKEHLVVT-FI^VLELMKNQIiVLIEQEHNFSDIYI 242 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 156/235 (66%) , Positives = 191/235 (80%) , Gaps = 2/235 (0%) 

Query: 1 MDIKLKDFEGPLDLLLHLVSKYEVDIYDVPIVEVIEQYLAyiATLQAMRLEVAGEYMLMA 60 
10 MDI KLKDFEGPLDLLLHLVS + Y+VD I Y+ VP I V3VI EQYL YI TLQ M+LEVAG+YMLMA 

Sbjct: 1 MD1KLKDFEGPLDLLLHLVSQYKVDIYEVPIVEVIEQYLNYIETLQVMKLEVAGDYMLMA 60 

Query: 61 SQLMLIKSRNLLPKVVESNPIEDDPEMELLSQLEEYRRFKVLSEELANQHQERAKYFSKP 120 
SQLMLIKSR LLPKWE IE+D E +LL +4-EEY RFK +S+ LA QH +RAK++SKP 
15 Sbjct: 61 SQLMLIKSRRLLPKWEH- - IEEDLEQDLLEKIEEYSRFKAVSQAIAKQHDQRAKWYSKP 118 

Query: 121 KQEVIFEDAILLHDKSVMDLFLTFSQMMSQKQKELSNSQTVIEKEDYRIEDMMIVIERHF 180 

KQE+IFEDAIL DK+VMDLFL FS +M+ K+ L N+ TVIE++DY+IEDMM I++ 
Sbjct: 119 KQEL I FEDAILQEDKTVMDLFLAFSNIMAAKRAVLKNNHTVI ERDDYKIEDMMAS I KQRL 178 

20 

Query: 181 NLKKKTTLQEVFADCQTKSEMITLFIAMLELIKLHQITVEQDSNFSQVILRKEEK 235 

+ L +F +CQT +E+I++FLA LELIKLH + VEQ SNF +ILRKE+K 
Sbjct: 179 EKENVIRLSAIFEECQTLNEVISIFLSSLELIKLHWFVEQLSMFGAIILRKEKK 233 

25 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1538 

A DNA sequence (GBSxl629) was identified in S.agalactiae <SEQ ID 4737> which encodes the amino 
acid sequence <SEQ ID 4738>. This protein is predicted to be pXOl-18. Analysis of this protein sequence 
30 reveals the following: 

Possible site: 15 

>>> Seems to have no N-terrainal signal sequence 

INTEGRAL Likelihood = -4.14 Transmembrane 128 - 144 ( 127 - 145) 

35 Final Results 

bacterial membrane Certainty=0. 2657 (Affirmative) < suco 

bacterial outside — Certainty=0. 0000 (Not Clear) < suco 
bacterial cytoplasm --- Certainty=0 . 0000 (Not Clear) < suco 

40 The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAB05248 GB:AP001512 integrase/recombinase [Bacillus halodurans] 
Identities = 67/271 (24%), Positives = 117/271 (42%), Gaps = 35/271 (12%) 





11 


LKTMINDINNFIESKK LSUi 




58 






++T+ N++ F+ +K LS N+ +SY DLKQ+ + + ++ E + Y 




Sbj ct : 


1 


METVNNNLQQFLHFQKVERGLSNNTICSYGRDLKQYIQYVERVEEIRSAKNITRETILHY 


60 




59 


QQSLSEFKL- -TARKRKLSAVNQE 


'LFFLYNRGTLKEFYRL QETEKITLRQTKSQI 


111 






L E T+ R ++A+ F FL + + T+++ AT ++ 




Sbjct: 


61 


LYHLREQGRAETSIARAVAAIRSFHQFLLREKLSDSDPTVHVEIPKATKRLPKALTIEEV 


120 




112 


MDLSNFYQDTDYPSGRLIALLIL- 


■ -SLGLTPAEIANLKKADFDTTFNILS - IEKSQMKRI 


168 






L N Q D S R A+L L 


+ G+ +E+ L +D + + + K +RI 




Sbjct: 


121 


EALLNS PQGRDPFSLRNKAMLELI 


lYATGMRVSELIGLTLSDIHLSMGFVRCLGKGNKERI 


180 




169 


LKLPEDLLPFLLESLEEDG 


DLVF-EHNGKPYSRQWFFNQLTDFLNEKN-E 216 






+ + + + +ES +G 


D VF H+G+P SRQ F+ L N + 




Sbjct: 


181 


IPIGQ-VATEAVESYLANGRGKLN 


lIGCCSHDHVFVNHHGRPLSRQGFWKMLKQIiAKNVNID 


239 


Query: 


217 


QQLTAQLLREQFILKQKENGKTM1 


■ELSRLLG 247 
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4 LT LR F ENG + + +LG 

Sbjct: 240 KPLTPHTLRHSFATHLLENGADLRAVQEMLG 270 

A related DNA sequence was identified in S.pyogenes <SEQ ID 473 9> which encodes the amino acid 
5 sequence <SEQ ID 4740>. Analysis of this protein sequence reveals the following: 

Possible site: 20 
»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -0.90 Transmembrane 111 - 127 ( 110 - 127) 

10 Final Results 

bacterial membrane Certainty=0. 1362 (Affirmative) < suco 

bacterial outside Certainty=0 .0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

1 5 The protein has no significant homology with any sequences in the GENPEPT database. 
An alignment of the GAS and GBS proteins is shown below. 

Identities = 117/243 (48%) , Positives = 167/243 (68%) , Gaps = 1/243 (0%) 

Query: 18 INNFIESKKLSLNSRKSYHYDLKQFYKI IGGHVNSEKLALYQQSLSEFKLTARKRKLSAV 77 
20 I FI SK LS NS+K+Y YDL+QF ++IG VN +KL LYQ S++ L+A+KRKLS 

Sbjct: 5 IEPFIASICALSQNSQKAYRYDLQQFCQLIGERVNQDKLLLYQNSIANLSLSAKKRKLSTA 64 

Query: 78 NQFLFFLYNRGTLKEFYRLQETEKITLAQTK-SQIMDLSNFYQDTDYPSGRLIALLILSL 136 
NQFL++LY L ++RL +T K+ + 4- + I++ FYQ T + G+LI+LLIL h 
25 Sbjct: 65 NQFLYYLYQI KYLNSYFRLTDTMKVMRTEKQQAAI INTDI FYQKTPFWGQLI SLLILEL 124 

Query. 137 GLTPAEIANLKKADFDTTFNILSIEKSQMKRILKLPEDLLPFLLESLEEDGDLVFEHNGK 196 

GLTP+E+A ++ A+ D F +L+++ + R+L L + L+PFL + L +FEH G 

Sbjct: 125 GLTPSE VAGIEV7ANLDLNFQMLTLKTKKGVRVLPLSQI L I PFLEQQLVGKE VYLFEHRGI 184 

30 

Query: 197 PYSRQWFFNQLTDFLNEKNEQQLTAQLLREQFILKQKENGKTMTELSRLLGLKTPITLER 256 

P+SRQWFFN L F+ + LTAQ LREQFILK+K GK++ ELS +LGLK+P+TLE+ 

Sbjct: 185 PFSRQWFFNHLKTFTOSIGYEGLTAQKLREQFILKEKLAGKSIIELSDILGLKSPMTLEK 244 

35 Query: 257 YYR 259 

YY+ . 
Sbjct: 245 YYK 247 

SEQ ID 4738 (GBS383) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
40 extract is shown in Figure 68 (lane 7; MW 32kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 72 (lane 5; MW 57. lkDa). 
The GBS383-GST fusion product was purified (Figure 212, lane 8) and used to immunise mice. The 
resulting antiserum was used for FACS (Figure 308), which confirmed that the protein is immunoaccessible 
on GBS bacteria. 

45 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1539 

A DNA sequence (GBSxl630) was identified in S.agcdactiae <SEQ ID 4741> which encodes the amino 

acid sequence <SEQ ID 4742>. Analysis of this protein sequence reveals the following: 

50 Possible site: 21 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2465 (Affirmative) < suco 
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bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

5 sGP:BAB05201 GB:AP001512 unknown conserved protein in B. subtilis 

[Bacillus halodurans] 
Identities = 38/136 (27%) , Positives = 73/136 (52%) , Gaps = 1/136 (0%) 





ES FLiLNHLDHYL I PAEDVAIFVDTHNADHVMLLIASNGFSRVPVITKEKKWGTI S I SDI 


66 




++ + N L +IP B VA ++ +H +L+L +G++ +PV+ + K G IS S I 




Sbjct: 7 


QNIMDNDLKELVIPFBKVAHVHLSNPLEHALLVLIKSGYTAIPVLDEHSKLHGVISKSLI 


66 


Query: 67 


MGYQSKGQLTDVffi-MAQTDIVEKVNTKIEPINEAATLTAIMHKIVDYPFLPVISDQNDFR 


125 




+ + + E +A + +++N +1 1+ A+ + + + +PF+ ++ D F 




Sbjct: 67 


IiDALLGTORIEMERLAHLVVKDVMNPEIPTIHHKASFSRALKVSIAHPFICILDDDGSFL 


126 




5 GIITRKSILKAINSIi 141 






GI+TR +IL IN L 




Sbjct: 12' 


7 GILTRSTILSFINRQL 142 





20 

A related DNA sequence was identified in S.pyogenes <SEQ ID 4743> which encodes the amino acid 
sequence <SEQ ID 4744>. Analysis of this protein sequence reveals the following: 

Possible site: 47 

»:> Seems to have no N-terminal signal seouence 

25 



Certainty=0. 3539 (Affirmative) < succ; 
Certainty=0. 0000 (Not Clear) < suco 
Certainty=0. 0000 (Not Clear) < suco 



bacterial cytoplasm — 

bacterial merrfcrane 

bacterial outside — 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 119/153 (77%) , Positives = 137/153 (88%) 

Query: 1 MIAKEFESFLDNHIjDHYLIPAEDVAIEVDTHNADHVMLLLASNGFSRVPVITKEKKYVGT 60 

MIAKEFE+FL++HLD+YLIP +D+AIF+ETHNADHVMLLL SNGFSRVPVIT+EKKYVGT 
Sbjct: 1 MIAKEFETFLMSHLDNYLIPEQDLAIFIDTHNADHVMLLLVSNGFSRVPVITREKKYVGT 60 

Query: 61 ISISDIMGYQSKGQLTDWEMAQTDIVEMVNTKIEPINEAATLTAIMHKIVDYPFLPVISD 120 

ISISDIM YQSK QLTDWEM+QTDI EMVNTKIE 1+ ++LT IMHK++D+PFLPV+ 
Sbjct: 61 ISISDIimYQSKRQLTDWEMSQTDIGEMVNTKIETISITSSLTEIMHKLIDFPFLPVVDR 120 

Query: 121 QNDFRGIITRKSILKAINSLLHDFTDEYTITPK 153 

N F GIITRKSILKA4NSLLHDFTD+YTI K 
Sbjct: 121 ANRFVGIITRKSILKAVNSLLHDFTDDYTIIKK 153 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1540 

A DNA sequence (GBSxl631) was identified in S.agalactiae <SEQ ID 4745> which encodes the amino 
acid sequence <SEQ ID 4746>. Analysis of this protein sequence reveals the following: 

2 N-terminal signal sequence 

• Final Results 

bacterial cytoplasm Certainty=0. 4421 (Affirmative) < suco 

bacterial membrane Certainty==0.0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 
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The protein has homology with the following sequences in the GENPEPT database. 



Query: 5 KLWMSDSHGDRDIVKDIKNHYLGIC/DAIFHNGDSELPSSDPIWEGIHWTGNCDYDSGY 64 

KL4++SDSHG D +K + + + +VDAI H GDSELP D EG+++V GNCD+ + 
Sbjct: 2 KLLILSDSHGWSDELKAVADKHRQEVDAIIHCGDSELPRDDRALBGMNIVRGNCDFGVDF 61 

Query: 65 PEVLVTKIDNAVIVQTHGHLHQINFTVnDKLDLLAQQEDADICLYGHLHRADAWKNGKTIF 124 

p E + + + + THGHL+ + ++ L A++ A + +GH H A +++ +F 
Sbjct: 62 PEDFIKTVGDFNVYVTHGHLYNVKMSYVSLTYRAEEVGAQLVCFGHSHVATSFQENGIVF 121 

Query: 125 INPGSVLQPRGPINEKLYAWTITDSKVLVEYYTRQHQPYPNLTKELSR 173 

+NPGS+ PR E+Y+ + D++ + + R +L + R 

Sbjct: 122 VNPGSLRLPRNR-KEQTYCLAYVRDDQIELTFLDRDGHEVTDLQRTYLR 169 

A related DNA sequence was identified in S.pyogenes <SEQ ID 4747> which encodes the amino acid 
sequence <SEQ ID 4748>. Analysis of this protein sequence reveals the following: 

Possible site: 14 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .3835 (Affirmative) < suco 

bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below. 
Identities = 117/173 (67%) , Positives = 143/173 (82%) 

Query: 1 MAIRKLVVMSDSHGDRDIVKDIKNHYLGKVDAIFHNGDSELPSSDPIVffiGIHVVTGNCDY 60 

MA + ++VMSDSHGDRDIV+ IK+ YLG+VDAI FHNGDSEL SSDPIW GI+W GNCDY 
Sbjct: 1 MASKTIIVMSDSHGDRDIVQAIKDKYIiGQVDAIFHNGDSELNSSDPIWAGIYWGGNCDY 60 

Query: 61 DSGYPEVLVTKIDNAVIVQTHGHIiHQINFTWDKLDLLAQQEDADICLYGHLHRADAWKNG 120 

D+GYP+ LVT++ I QTHGHL+ INFTWDKLD AQ+ ADICLYGHLHR AW+ G 
Sbjct: 61 DTGYPDRLVTQLGTVTIAQTHGHLYH1NFTWDKLDYFAQEVVADICLYGHLHRPAAWQVG 120 

Query: 121 KTIFINPGSVLQPRGPINEKLYAVVTITDSKVLVEYYTRQHQPYPNLTKELSR 173 

+T+F+NPGSV QPRG INEKLYA V +TD+++ V+Y+TR H+ YP+L+KE R 
Sbjct: 121 QTLFMNPGSVTQPRGEINEKLYARA7ELTDTQ1KVDYFTRDHKLYPSLSKEFKR 173 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1541 

A DNA sequence (GBSxl632) was identified in S.agalactiae <SEQ ID 4749> which encodes the amino 
acid sequence <SEQ ID 4750>. This protein is predicted to be HAM1 family protein. Analysis of this 
protein sequence reveals the following: 

z N-terminal signal sequence 

Final Results 

bacterial cytoplasm --- Certainty=0. 1218 (Affirmative) < suco 

bacterial membrane --- Certainty=0. 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0. 0000 (Not clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB14796 GB:Z99118 similar to hypothetical proteins [Bacillus subtilis] 
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Identities = 96/189 (50%) , Positives = 130/189 (67%) , 



Sbj< 
Query: 



Sbjct 



128 LIATHJTOGKTKEFRELFGKLGLKVENLNDYPDLFEraErGMTFEENARLKAETISKLTGK 187 

+IATHN GK KEF+E+ G V++L + E+EETG TFEENA +KAE ++K K 

8 IIATHNPGKVKEFKEILEPRGYDVKSLAEIGFTEEIEETGHTFEENAIMKAEAVAKAVNK 67 

188 WISDDSGLKVDALGGLPGVWSARFSGPDATDAR1MAKLLHELAMVFDKERRSAQFHTTL 247 

MVI+DDSGL +D LGG PGV+SAR++G D N K+L EL + +KE+R+A+F L 
68 MVIADDSGLS IDNLGGRPGVYSARYAGEQKDDCANIEKVLSELKGI -EKEQRTARFRCAL 126 



A related DNA sequence was identified in S. pyogenes <SEQ ID 475 1> which encodes fee amino acid 
sequence <SEQ ID 4752>. Analysis of this protein sequence reveals the following: 



3 N- terminal signal ; 



Final Results 

bacterial cytoplasm --- Certainty=0. 2590 (Affirmative) < succ 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 214/325 (65%) , Positives = 253/325 (77%) , Gaps = 5/325 (1%) 



W+ S L+ F++DMINQE R IKVT H G IL+ E+ +L V+LP+ G E 



Query: 


1 


Sbjct: 


14 


Query: 


57 


Sbjct: 


74 


Query: 


117 


Sbj ct: 


133 




177 


Sbjct: 


193 


Query: 


237 


Sbjct: 


297 


Sbjct: 


313 



FGD I LI AT NEGKTKEFR LFG+LG +VENLNDYP+LPEV ETG TFEENARL 



KAETIS+LTGHW++DDSGLKVDALGGLEGWSARFSGPDATDA+NNAKLLHELAMVFD+ 



A +KN LSHRGQAVRKLMEVFP WQ 
ADQKNQLSHRGQAVRKLMEVFPAWQ 337 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 
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Example 1542 

A DNA sequence (GBSxl633) was identified in S.agalactiae <SEQ ID 4753> which encodes the amino 
acid sequence <SEQ ID 4754>. This protein is predicted to be glutamate racemase (murl). Analysis of this 
protein sequence reveals the following: 

Possible site: 45 

>» Seems to have no N-terminal signal sequence 

Likelihood = -1.86 Transmembrane 114 - 130 ( 114 - 130) 



Final Results 

bacterial membrane Certainty=0 . 1744 (Affirmative) < suco 

bacterial outside Certainty=0 .0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 .0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 10067> which encodes amino acid sequence <SEQ ID 
10068> was also identified. 



The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAF72713 GB:AF263927 glutamate racemase [Carnobacterium sp. St2] 
Identities = 160/267 (59%), Positives = 202/267 (74%), Gaps = 3/267 (1%) 



Query: 


27 


MDSRPIGFLDSGVGGLTVVTCEMFRQLPEEEVIFIGDQARAPYGPRPAQQIREFTWQMVNF 


86 






M + IGF+DSGVGGLTWKE RQLP E + ++GD AR PYGPRP Q-R+FTW+M +F 




Sbjct: 




MKKQAIGFIDSGVGGLTVVKEAMRQLPNESIYYVGDTARCPyGPRPEDQVRKFTWEMTHF 


'60 




87 


LLTKNVKMIVIAOOTATAVAWQEIKEKLDIPVLGvIL^ 


146 






hh KN+KM+VIACNTATA A ++IK+KL IPV+GVILPG+ AAIK+T+ ++G+IGT T 




Sbjct: 


61 


LLDKNIKMLVIACOTATAAALKDIKKJCLAIPVIGVILPGSRAAIKATHTNRIGVIGTEGT 


120 


Query: 


147 


VKSDAYRQKIQALSPNTAWSLACPKFVPIVESNQMSSSIAKKVVYETLSPLVGK-LDTIj 


205 






VKS+ Y++ I + V SLACPKFVP+VESN+ SS++AKKW ETL PL + LDTL 




Sbjct: 


121 


VKBNO^KKMIHSKDTKALVTSIACPKFVPLVESNEYSSAIAKKVVAETLRPLKNEGLDTL 


180 




206 


ILGCTHYPLLRPIIQNVMGAEVKLIDSGAETVRDISVLLNYFEIM5NWQNKH-GGHHFYT 


264 






ILGCTHYPLLRPIIQN +G V LIDSGAETV ++S +L+YF + + QNK +FYT 




Sbjct: 


181 


ILGCTHYPLLRPIIQNTLGDSVTLIDSGAETVSEVSTILDYFNLAVDSQNKEKAERNFYT 


240 






TASPKGFKE1AEQWLS-QEINVERIVL 290 








T S + F IA +WL ++ VE I L 




Sbjct: 


241 


TGSSQMFHAIASEWLQLDDLAVEHITL 267 





A related DNA sequence was identified in S.pyogenes <SEQ ID 475 5> which encodes the amino acid 

sequence <SEQ ID 4756>. Analysis of this protein sequence reveals the following: 

Possible site: 19 

uncleavable N-term signal seq 

1 - 104 ( 86 - 104) 



Final Results 

bacterial membrane Certainty=0. 1680 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

[Carnobacterium sp. St2] 
202/267 (74%) , Gaps = 3/267 (1%) 

Query: 1 MDTRPIGFLDSGVGGLTWCELIRQLPHEKIVYIGDSARAPYGPRPKKQIKEYTWELVNF 60 
M + IGF+DSGVGGLTW E +RQLP+E I Y+GD+AR PYGPRP+ Q++++TWE+ +F 

Sbjct: 



Query: 61 LLTQNVKMIVFACOTATAVAWEEVKAALDIPVLGVVLPGASAA1KSTTKGQVGVIGTPMT 120 
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LL +N+KM+V ACNTATA A +++K h IPV+GV+LPG+ AAIK+T ++GVIGT T 
Sbjct: 61 LLDKNIKMLVIACOT'ATAA^KDIKKKIAIPVIGVILPGSRAAIKRTHTNRIGVIGTEGT 120 

Query: 121 VASDIYRKKIQLLAPSIQWSLACPKFVPIVESNEMCSSIAKKIVYDSLAPLVGK-IDTL 179 

V S+ Y4-K I V SLACPKFVP+VESNE S+IAKK+V ++L PL + +DTL 

Sbjct: 121 VKSNQYKKMIHSKDTKALVTSIACPKFVPLVESNEYSSAIAKKVVAETLRPLKNEGLIDTL 180 

Query: 180 VLGCTHYPLLRPIIQNVMGPSWLIDSGAECVRDISVLLNYFDIN-GNYHQKAVEHRFFT 238 

+LGCTHYPLLRPIIQN +G SV LIDSGAE V ++S +L+YF++ + +++ E F+T 
Sbjct: 181 ILGCTHYPLLRPIIQirrLGDSVTLIDSGAETJSEVSTILDYFNIaRVEiSQMKEKAERNFYT 240 

Query: 239 TANPE I FQE IAS I WLK- QKINVEHVTL 264 

T + ++F IAS WL+ + VEH+TL 
Sbjct: 241 TGSSQMFHAIASEWLQLDDLAVEHITL 267 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 195/264 (73%) , Positives = 231/264 (86%) 



V SD YR+KIQ L+P+ V SLACPKFVPIVESN+M SS+AKK+VY++L+PLVGK+DTL+ 



LGCTHYPLLRP 1 1 QNVMG VKLIDSGAE VRDISVLLNYF+IN N+ K H F+TTA 



+P+ F+EIA WL Q4-INVE + L 
NPEIFQEIASIWLKQKINVEHVTL 264 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1543 

A DNA sequence (GBSxl634) was identified in S.agalactiae <SEQ ID 4757> which encodes the amino 
acid sequence <SEQ ID 4758>. Analysis of this protein sequence reveals the following: 

Possible site: 21 

»> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood =-11.36 Transmembrane 3 - 19 ( 1-27) 



Query: 


27 


Sbj ct : 


1 




87 


Sbjct: 


61 


Query: 


147 


Sbjct: 


121 




207 


Sbjct: 


181 




267 


Sbjct: 


241 



Final Results 

bacterial membrane Certainty=0 . 5543 (Affirmative) < succ 

bacterial outside Certainty=0 . OOOO (Not Clear) < suco 

bacterial cytoplasm — Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 



MSITIWILLIIVALFGGLVGGIFIARKQIEKEIGEHPRLTPDAIREMMSQMGQKPSEAKV 60 
M++ + IL+ +VAL G+ G FIARK + + ++P + +R MM QMG KPS+ K+ 
MTLWVGILVGWALLIGVALGFFIARKYMMSYLKKNPPINEQMLRMMMMQMGMKPSQKKI 60 



•Query: 61 QQTYRNIVKHAK 72 
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Q + + K 
Sbjct: 61 NQMMKAMNNQTK 72 

A related DNA sequence was identified in S.pyogenes <SEQ ID 4759> which encodes the amino acid 
sequence <SEQ ID 4760>. Analysis of this protein sequence reveals the following: 

Possible site: 15 

»> Seems to have an uncleavable N-term signal seg 

INTEGRAL Likelihood =-10.72 Transmembrane 7 - 23 ( 1-27) 

Final Results 

bacterial membrane --- Certainty=0. 5288 (Affirmative) < suco 
bacterial outside — Certainty=0. 0000 (Not Clear) < suco 
bacterial cytoplasm --- Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 62/79 (78%) , Positives = 69/79 (86%) 

Query: 1 MSITIWILLIIVALFGGLVGGIFIARKQIEKEIGEHPRLTPDAIREMMSQMGQKPSEAKV 60 

MS IWILL+IVAL G+ GGIFIARKQIEKEIGEHPRLTP+AIREMMSQMGQKPSEAK+ 
Sbjct: 1 MSTAIWILLLIVALGVGVFGGIF1ARKQIEKEIGEHPRLTPEAIREMMSQMGQKPSEAKI 60 

Query: 61 QQTYRNI VKHAKTAI KTKK 79 

QQTYRNI+K +K A+ K 
Sbjct: 61 QQTYRNIIKQSKAAVSKGK 79 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1544 

A DNA sequence (GBSxl635) was identified in S.agalactiae <SEQ ID 4761> which encodes the amino 
acid sequence <SEQ ID 4762>. Analysis of this protein sequence reveals the following: 



Final Results 

bacterial membrane --- Certainty=0. 4142 (Affirmative) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm — Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens f 



Example 1545 

A DNA sequence (GBSxl636) was identified in S.agalactiae <SEQ ID 4763> which encodes the amino 
acid sequence <SEQ ID 4764>. Analysis of this protein sequence reveals the following: 

possible site: 30 

>>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood =-10.83 Transmembrane 56 - 72 ( 50 - 105) 

INTEGRAL Likelihood = -7.27 Transmembrane 27 - 43 ( 17 - 

INTEGRAL Likelihood = -6.26 Transmembrane 76 - 92 ( 73 - 105! 

INTEGRAL Likelihood = -4.83 Transmembrane 119 - 135 ( 118 - 

INTEGRAL Likelihood = -1.65 Transmembrane 160 - 176 ( 160 - 
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Final Results 

bacterial membrane Certainty=0. 5331 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 8837> which encodes amino acid sequence <SEQ ID 8838> 
was also identified. 

The protein has no significant homology with any sequences in the GENPEPT database. 

A related DNA sequence was identified in S.pyogenes <SEQ ID 4765> which encodes the amino acid 
sequence <SEQ ID 4766>. Analysis of this protein sequence reveals the following: 



40 



Possible site: 19 
>> Seems to have a cleavable N-term signal seq. 
INTEGRAL Likelihood =-10 
INTEGRAL Likelihood = -7 
INTEGRAL Likelihood = -3 
INTEGRAL Likelihood = -2 
INTEGRAL Likelihood = -1 



Transmembrane 45 - 61 ( 37 - 

06 Transmembrane 74 - 90 ( 62 - 

45 Transmembrane 110 - 126 ( 108 - 

18 Transmembrane 149 - 165 ( 149 - 165! 

91 Transmembrane 21 - 37 ( 20 - 371 



Final Results 

bacterial membrane --- Certainty=0. 5394 (Affirmative) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 111/173 (64%), Positives = 145/173 (83%) 

Query: 12 MSKKTTQMVSyTSILVAFAIMIPIIMPAKIIIGPASFTLASHVPLFLSIFISVPVAILVA 71 

M+KK TQ+++YTSILVAFAI+IPIIMP K+IIGPASFTLASHVPLFL+IF+S+PVAILVA 
Sbjct: 1 MTKKETQLIAYTSILVAFAILIPIIMPLKLIIGPASFTLASHVPLFLAIFMSIPVAILVA 60 

Query: 72 LGTGLGFLLAGFPIVIVLRALSHIGFALIAAFLIKSKPSLLMSKWQTLLFAVAINIIHGL 131 

LGT LGFLLAG P++IVLRALSH+ FA++AA+ + KP L+ S 4 FA IN+IHGL 
Sbjct: 61 LGTTLGFLLAGLPLIIVLRALSHLLFAILAAWiLSRKPQLMTSAVKCFSFAFFINVIHGL 120 

Query: 132 LEFITWIITMTSNSSSTYLWSLFSLIGLGSLLHGLVDFYIALFIWKWMTQKL 184 

EF+ VYI+T T+ +S +Y WS+ LIGLGSL+HG++DFY+AL +W+++ + L 
Sbjct: 121 AEFLWYILTATTATSMSYFWSMLGLIGLGSLIHGILDFYLALVLWRFLAKNL 173 

A related GBS gene <SEQ ID 10789> and protein <SEQ ID 10790> were also identified. Analysis of this 
protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 3 
SRCFLG : 0 

McG: Length of DR: 24 

Peak Value of UR: 3.16 
Net Charge of CR: 2 
McG: Discrim Score: 12.56 
GvH: Signal Score (-7.5): -0.16 

Possible site: 19 
»> Seems to have a cleavable N-term signal seq. 
Amino Acid Composition: calculated from 20 
ALOM program count: 5 value: -10.83 threshold: 0.0 

INTEGRAL Likelihood =-10.83 Transmembrane 45 - 61 ( 39 - 

INTEGRAL Likelihood = -6.26 Transmembrane 65 - 81 ( 62 - 

INTEGRAL Likelihood = -4.83 Transmembrane 108 - 124 ( 107 - 

INTEGRAL Likelihood = -1.65 Transmembrane 149 - 165 ( 149 - 

INTEGRAL Likelihood = -0.27 Transmembrane 24 - 40 ( 24 - 

PERIPHERAL Likelihood =0.42 86 
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*** Reasoning Step: 3 

Final Results 

bacterial membrane --- Certainty=0.5331(Affirmative) < suco 
bacterial outside --- Certainty=0. 0000 {Not Clear) < suco 
bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1546 

A DNA sequence (GBSxl637) was identified in S.agalactiae <SEQ ID 4767> which encodes the amino 
acid sequence <SEQ ID 4768>. This protein is predicted to be transcriptional regulator, biotin repressor 
family. Analysis of this protein sequence reveals the following: 



3 N-terminal signal £ 



20 Final Results 

bacterial cytoplasm --- Certainty=0 .2237 (Affirmative) < suco 

bacterial membrane --- Certainty=0. 0000 (Not Clear) < suco 

bacterial outside — Certainty=0. 0000 (Not Clear) < suco 

25 The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB14749 GB:Z99118 yrxA [Bacillus subtilis] 
Identities = 72/155 (43%) , Positives = 112/165 (67%) , Gaps = 2/165 (1%) 

Query: 6 RRENILTTLKGTKEAISASTLAKIFSVSRQVIVGDIALLRAQQCDIISTPKGYL-MSSAL 64 
30 RR+ +L LK +K ++ LAK +VSRQVTV DI+LL+A+ II+T +GY+ M +A 

• Sbjct: 12 RRDQLLL^KESKSPLTGGELAKKANVSRQVIVQDISLLKAKIWPIIATSOGYVYMDAAA 71 

Query: 65 STHQFTARLV- CQHGIEQTEEELEI ILRYQGI IMNVEVEHPIYGMLTAPLNIQSQKDIDN 123 
HQ R++ C HG E+TEEEL++I+ + +V++EHP+YG LTA + + ++K++ + 

35 Sbjct: 72 QQHQQAERIIACLHGPERTEEELQLIVDEGVTVKDVKIEHPVYGDLTAAIQVGTRKEVSH 131 

Query: 124 FTAKLKVSNAELLSSLTDGLHTHMISCQDQSVFDQICEALKKAGI 168 

F K+ +NA LS LTDG+H H ++ D+ DQ C+AL++AGI 
Sbjct: 132 FIKKINSTNAAYLSQLTDGVHLHTLTAPDEHRIDQACQALEEAGI 176 

40 

A related DNA sequence was identified in S.pyogenes <SEQ ID 4769> which encodes the amino acid 
sequence <SEQ ID 4770>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 



Final Results 

bacterial cytoplasm --- Certainty=0. 2971 (Affirmative) < suco 

bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside — Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 109/170 (64%) , Positives = 136/170 (79%) 

Query: 1 MKAQERRENILTTLKGTKEAISASTLAKIFSVSRQVIVGDIALLRAQQCDIISTPKGYLM 60 

MKA++RR+ 1+ L ++A+SA+ L K+ VSRQVTVGDIALLRAQQ DIISTPKGY+M 
Sbjct: 1 MKAEDRRQKIIECnNSEQKAVSATRLGKLLGVSRQVIVGDIALLRAQQIDIISTPKGYIM 60 

Query: 61 SSALSTHQFTARLVCQHGIEQTEEELEIIIjRYCjGIIMNVEVEHPIYGMLTAPtNIQSQKD 120 
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Query: 121 IDNFTAKLKVSNAELLSSLTDGLHTHMISCQDQSVFDQICEALKKAGILY 170 

+ NF +KL S AELLSSLT+GLH+H+ISC Q F I L+ AGILY 
Sbjct: 121 VTNFMSKLSQSKAELLSSLTEGLHSHLISCPSQEAFLAIKHDLELAGILY 170 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful 
vaccines or 



10 Example 1547 

A DNA sequence (GBSxl638) was identified in S.agalactiae <SEQ ID 4771> which encodes the a 
acid sequence <SEQ ID 4772>. Analysis of this protein sequence reveals the following: 



3 N- terminal signal sequence 



INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 



Likelihood 
Likelihood 
Likelihood = -7 
Likelihood = -5. 
Likelihood = -4. 
Likelihood = -2 
Likelihood 



Transmembrane 
Transmembrane 
Transmembrane 



OS 



■ Final Results - 

bacterial i 
bacterial outside • 
bacterial cytoplasm • 



Transmembrane 113 - 



-- Certainty=0. 4376 (Affirmative) . 
-- Certainty=0. 0000 (Not Clear) < i 
-- Certainty=0. 0000 (Not Clear) < : 



A related GBS nucleic acid sequence <SEQ ID 10069> which encodes amino acid sequence <SEQ ID 
10070> was also identified. 



30 The protein has homology with the following sequences in the GENPEPT database. 



>GP:AAC18360 GB:AF064763 putative n 



; spanning protein 



Query: 38 IMLYMFPQNMIAIMQKMPGLYFGAIILELVLVFVASGAARRNTPAALPLFLIYSALNGFT 97 

IM+ F NM AI+Q 1+ LV+V G A +N+ ALP+F+ Y+A GF 

Sbjct: 1 IMITFFLDNMRAILQSGSLFLLVLWIIPLVMWSLC^LAMKNSIOvlALPIFIGYAAFMGFL 60 

Sbjct: 



LSFIIARYTQTTVLQAFITSAAVFFAMALIGAKTKKDLSGMRKALMAALIGILIASLVNL 157 
+SF + YT T + AFIT++A+FF +++ G TK++LSGM KAL A+ G+++A L+NL 
ISFTLLMYTATDITIAFITASAMFFGLSVYGRFTKRNLSC-MGKALGVAVWGLIVAMLLNL 12 0 



. Query: 158 FIGSGGMSYI IS IVCVII FSGLIAYDNQMI KYVYNSQGGQVADGWAVSMALSLYLDFTNL 217 
F S G++ +IS+V V+IFSGLIA+DNQ I VYN+ GQV+DGWA+SMALSLYLDFIN+ 
Sbjct: 121 FFASTGLTILISLVGWIFSGLIAWDNQKITQVYNAHNGQVSDGWAISMALSLYLDFINM 180 



Sbjct: 



218 FLNILRLF 225 

FL +LRLF 
181 FLFLLRLF 188 



A related DNA sequence was identified in S.pyogenes <SEQ ID 4773> which encodes the amino acid 
sequence <SEQ ID 4774>. Analysis of this protein sequence reveals the following: 



Possible site: 40 
•> Seems to have no N-terminal signal sequence 
INTEGRAL Likelihood = -8.97 Transmembrane 
INTEGRAL Likelihood = -5.89 Transmembrane 
INTEGRAL Likelihood = -5.61 
Likelihood = -4.71 



143 - 159 ( 138 - 165) 

164 - 180 ( ISO - 184) 

56 - 72 ( 55 - 77) 

113 - 129 ( 110 - 130) 
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Likelihood = -2.81 Transmembrane 203 - 219 ( 203 - 222) 
INTEGRAL Likelihood = -2.75 Transmembrane 24 - 40 ( 23 - 41) 
Likelihood = -2. 76 Transmembrane 86 - 102 ( 86 - 104) 



5 Final Results 

bacterial membrane Certainty=0. 4588 (Affirmative; 

bacterial outside Certainty=0 . 0000 (Not Clear) ■ 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) . 

10 The protein has homology with the following sequences in the databases: 

>GP:AAC18360 GB:AF064763 putative membrane spanning protein 
[Lactococcus lactis subsp. cremoris] 
Identities = 90/189 (47%) , Positives = 133/189 (69%) 



+M4 F +N+ +IL + + II L++V A KN+ ALPIF+ Y + A GF 

IMITFFLDNMRAILQSGSLFLLVLWIIPLVMWSL2GLAMKNSKMALPIFIGYAAFMGFL 60 



AF++++A+FF +S+ G TKR++EG+ KA+ A+ G++VA L+NL 



Query: 


38 


Sbjct: 


1 




98 


Sbjct: 


61 




158 


Sbjct: 


121 




218 


Sbjct: 





+IS++ V+IFSGLIA DNQ I + VY A NGQV DGWA+ +MALSLYLDF IN+ 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 167/229 (72%), Positives = 202/229 <87%) 

Query: 1 MNDNVIYTQSDSGLNQFFAKIYGLVGIGVGLSAWSAIMLYMFPQNMIAIMQKMPGLYFG 60 

MND+VIYTQSD GLNQFFAKIY LVG+GVGLSA VS +MLY F +N+I+I+ P +Y+G 
Sbjct: 1 MNDHVIYTQSDVGLNQFFAKIYSLVGMGVGLSAFVSYLMLYPFRENB1SILVNQPMIYYG 60 

Query: 61 AIILELVLVFVASGAARRNTPAALPLFLIYSALNGFTLSFIIARYTQTTVLQAFITSAAV 120 

A I+EL+LVFVAS AAR+NTPAALP+FLI YSALNGFTLSFI I Y QTTV QAF++SAAV 
Sbjct: 61 AAIIELILVFVASSAARKNTPAALPIFLIYSALNGFTLSFIIVAYAQTTVFQAFLSSAAV 120 

Query: 121 FFAMALIGAKTKKDLSGmKAIjMAALIGILIASLvNLFIGSGGMSYIISIVCVIIFSGLI 180 

FFAM++IG KTK+D+SG+RKA+ AALIG+4-+ASL+NLFIGSG MSY+IS++ V+IFSGLI 
Sbjct: 121 FFAMSI IGVKTKRDMSGLRKAMFAALIGVWASLINLFIGSGMMSYVTSVI SVLI FSGLI 180 

Query: 181 AYDNQMIKYVYNSQGGQVADGWAVSMALSLYIjDFINLFLNILRLFARND 229 

A DNQMIK VY + GQV DGWAV+MALSLYLDFINLF+++LR+F RND 
Sbjct: 181 ASDNQMIKRVYQATNGQVGDGWAVAMALSLYLDFINLFISLLRIFGRND 229 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1548 

A DNA sequence (GBSxl639) was identified in S.agalactiae <SEQ ID 4775> which encodes the amino 
acid sequence <SEQ ID 4776>. Analysis of this protein sequence reveals the following: 

d N-terminal signal sequence 

■ Final Results 

bacterial cytoplasm Certainty=0. 2495 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not -Clear) < suco 
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A related GBS nucleic acid sequence <SEQ ID 10071> which encodes amino acid sequence <SEQ ID 
10072> was also identified. 

The protein has no significant homology with any sequences in the GENPEPT database. 

A related DNA sequence was identified in S.pyogenes <SEQ ID 4777> which encodes the amino acid 
sequence <SEQ ID 4778>. Analysis of this protein sequence reveals the following: 

I-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0 . 3277 (Affirmative) • 

bacterial membrane Certainty=0 . 0000 (Not Clear) < i 

bacterial outside Certainty=0 . 0000 (Not Clear) < i 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 127/163 (77%) , Positives = 141/163 (85%) 
Sbjct: 3 

Query: 67 RGGLLHDFFYYDWRVTKENKSHAWVHPRIAVIOTARKLTDlliNAREEDIILKHMWGATIAPP 126 

RGGLLHDFFYYDWRVTKFNK HAWVHPRIAVRNA+KLT+LN +EEDIILKHMWGATIA P 
Sbjct: 63 RGGLLHDFFYYDWRVTKFNKGHAWVHPRIAVRNAKKIjTELNKKEEDIILKHMWGATIAFP 122 

Query: 127 RYKESYIVTMVDKYWAWEASRPLKRIFKKPIRFSRKFLGSHN 169 

RYKESYIVTMVDKYWAV+EA PL++ + RK L SIM 

Sbjct: 123 RYKESYIVTrWDKYWAVKEAVTPLRQKWSNRRFLRRKTLQSHN 165 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1549 

A DNA sequence (GBSxl640) was identified in S.agalactiae <SEQ ID 4779> which encodes the amino 
acid sequence <SEQ ID 4780>. Analysis of this protein sequence reveals the following: 

possible site: 37 

»> Seems to have no N-terminal signal sequence 

Transmembrane 213 - 229 ( 212 - 229) 



Final Results 

bacterial membrane Certainty=0 .2211 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9413> which encodes amino acid sequence <SEQ ID 9414> 
was also identified 

The protein has homology with the following sequences in the GENPEPT database. 

? GP:CAB14825 GB-.Z99118 similar to rRNA methylase [Bacillus subtilis] 
Identities = 96/228 (42%), Positives = 143/228 (62%), Gaps = 5/228 (2%) 

Query: 3 QKICYRKSSYLIEGWHLFEEAEKYGAQFLNIFVT-ETAIDR-LRKPERR.IVVTDDV]jKE]jT 60 

+++ + +++LIEG HL EEA K I V ET I L + ++++D +T 

Sbjct: 22 KERTKTNTFLIEGEHLVEEALKSP3IVKEILVKDETRIPSDLETGIQCYMLSEDAFSAVT 81 



Query: 61 DSQTPQGIVAEIAFQETRWTDIKKGRFLVLEDVQDPGNLGTMVRTADAANFDAVFLSQKS 120 
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Query: 121 ADLYNQKTLRSMQGSHFHLPVFRVEIEQFVNFCKAEGITMIATTLSEQSVNYKNLPKYDY 180 

AD +N KTLRS QGSHFH+PV R + " +V+ KftEG+ + T L + Y+ +P+ + 
Sbjat: 140 ADAFNGKTLRSAO^SHFHIPVVRRNLPSYVDELKAEGVKVyGTAL-QNGAPYQEIPQSES 198 

Query: 181 FALIMGNEGQGISKTMTEFADVLAHIEMPGQAESLKVAVAAGWIFSL 228 

FALI+GNEG G+ + E+ D+ ++ + GQAESLNVAVAA ++++ L 
Sbjct: 199 FAL I VGI^GAGVDAALLEKTDLNLYVPLYGQAESIjNVAVAAAI LVYHL 246 

A related DNA sequence was identified in S.pyogenes <SEQ ID 478 1> which encodes the amino acid 
sequence <SEQ ID 4782>. Analysis of this protein sequence reveals the following: 

Possible site: 61 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -2.97 Transmembrane 229 - 245 ( 228 - 245) 

Final Results 

bacterial membrane Certainty=0 .2190 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm — Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 
Identities = 141/229 (61%) , Positives = 178/229 (77%) 



DS +PQGIVAE+ + + KG++LVLEDVQDPGNLGT++RTADAA FD VFLS+KS 



Query: 


1 


Sbjct: 


17 


Query: 


61 


Sbjct: 


77 






Sbjct: 


137 




181 


Sbjct: 


197 



AD+YNQKTLRSMQGSHFHLE++R ++ Q + ++ATTLS++SV+YK+L ++ 



AL++GNEGQGIS M AD L HI MPGQAESLNVAVAAG++IFSLI 
]^MiVLGNEGCjGISAEMAAIADQLVHITMPGQAESIJWAVAAGILIFSLI 245 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

A related GBS gene <SEQ ID 8839> and protein <SEQ ID 8840> were also identified. Analysis of this 
protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 7 
McG: Discrim Score: -7.98 
GvH: Signal Score (-7.5): -3.86 

Possible site: 37 
>» Seems to have no N-terminal signal sequence 
ALOM program count: 1 value: -3.03 threshold: 0.0 

INTEGRAL Likelihood = -3.03 Transmembrane 213 - 229 ( 212 - 229) . 
PERIPHERAL Likelihood = 5.14 149 
modified ALOM score: 1.11 



Final Results 

bacterial membrane Certainty=0 .2211 (Affirmative) < succ: 

60, bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 
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50 



The protein has homology with the following sequences in the databases: 

ORF0246B(259 - 984 of 1287) 

EGAD] 107730 ]BS2859 (4 - 246 of 248) hypothetical protein {Bacill! 
GP 1770029jeirtojCAA99602.il |Z75208 hypothetical protein {Bacillus 

GP|2635330 emb | CAB14825 . 1 | | Z99118 similar to rRNA methylase {Bacillus subtilis} 
PIR|G69984|G69984 rRNA methylase homolog ysgA - Bacillus subtilis 
%Match =20.3 

%Identity =43.0 %Similarity =62.3 

Matches = 105 Mismatches = 87 Conservative Sub.s = 47 



subtilis} 
subtilis} 



186 216 



360 



A*RNPTP*TRPETIK*TFFIT*PLF*YNRXMTTIITSKSN1^ 

I I I =1 Ml M I II III I 

MKQIESAKNQKVKDWKKLHTKKERTKTNTFLIEGEHLVEEALKSPGIVK 



WIFVT- ETAI - DRLRKPERAI WTDDVLKELTDSQTPQGIVAEIAFQETRWTDI KKGRFLVLEDVQDPGNLGTMVRTADA 
1 = 1 II I I = =:::| : :|:::||| | | | . . | ::: | | | | | | | | | | : | | | | | 

EILVKDETRIPSDLETGIQCYMLSEDAFSAVTETETPQQIAAVCHMPEEKIA--TARKVLLIDAVQDPGNLGTMIRTADA 



ANFDAVFLSQKSADLY^QICrLRSMCGSHFHLPVFRVEIEQFTOFCKAEGITMIATTLSEQSWK^PKYDYFALIMGNE 
I : I I I I 'II :| IIIH || ||||:|| 1 : :|: lllh : I I : h :|: Mil hill 

AGLDAVV1GDGTADAFNGKTLRSACGSHFHIPWRRHLPSYVDELKAEGVKVYGTAL-QNGAPYQEIPQSESFAL1VGNE 
14 ° 150 160 170 180 190 200 

894 924 954 984 1014 1044 1074 1104 

GQGISiaMTEEOTVLAHIEMrGQAE^ 
11= : i = h : : = I I I I I I I I I I I I I 
GAG VDAALLE KTDLNLYVPLYGQAE S IiNVAVAAA I L VYH LRG 
220 230 240 

SEQ ID 8840 (GBS430) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 77 (lane 5; MW 29kDa). 

GBS430-GST was purified as shown in Figure 220, lane 8. 

Example 1550 

A DNA sequence (GBSxl641) was identified in S.agalactiae <SEQ ID 4783> which encodes the amino 
acid sequence <SEQ ID 4784>. This protein is predicted to be acylphosphatase (acyP). Analysis of this 
protein sequence reveals the following: 

i uncleavable N-term signal seq 



Final Results 

bacterial membrane Certainty=0.0000 (Not Clear) < suco 

bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 10073> which encodes amino acid sequence <SEQ ID 
10074> was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

itative [Thermotoga maritima] 
I (58%) , Gaps = 3/88 (3%) 

Query: 24 MKKVHLIVSGRVQGVGFRYATYSLALEIGDIYGRVtTO 83 
MK + + V G VQGVGFRY T +A +G + G V N DDG+V I A+ D N + +F+ 
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Sbjct: 1 MKALKIRVEGI VQGVGFRYFTRRVAKS LG - VKGYVMNMDEGSVFIHAEG - DENALRRFLN 58 

Query: 84 KIRKGPSKWSKVTYVDIKLDNFDDFNDF HI 

++ KGP + VT V ++ + + DF 
Sbjct: 59 EVAKGPPA-AWTNVSVEETTPEGYEDF 85 

A related DNA sequence was identified in S.pyogenes <SEQ ID 4785> which encodes the amino acid 
sequence <SEQ ID 4786>. Analysis of this protein sequence reveals the following: 

N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 2433 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 69/95 (72%) , Positives = 85/95 (88%) 

Query: 19 KRGQvMKKVHLIVSGRVQGVGFRYATYSIJU^EIGDIYGRVWMmDGTVEILAQSTDSNKM 78 

K +M+KV LIVSGRVQGVGFRYAT++LAL+IGDIYGRVWNN+DGTVEILAQS DS+K+ 
Sbjct: 7 KFALLMQKVRLIVSGRVCGVGFRYATHTLALDIGDIYGRVWNfflroGTVEILAQSKDSDKI 66 

Query: 79 TQFIQKIRKGPSKWSKVTYVDIKLDNFDBFNDFKM 113 

FIQ++RKGPSKW+KVTYVD+ + NF+DF DF++ 
Sbjct: 67 ATF1QEVRKGPSKWAKVTYVDVTMANFEDFQDFQI 101 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 15S1 

A DNA sequence (GBSxl642) was identified in S.agalactiae <SEQ ID 4787> which encodes the amino 

acid sequence <SEQ ID 4788>. This protein is predicted to be membrane protein homolog (yidC). Analysis 

of this protein sequence reveals the following: 

Possible site: 16 

»> May be a lipoprotein 

INTEGRAL Likelihood =-12.52 Transmembrane 60 - 76 , ( 54 - 83) 

Likelihood = -3.66 Transmembrane 178 - 194 ( 177 - 196) 

Likelihood = -2.76 Transmembrane 140 - 156 ( 137 - 157) 

INTEGRAL Likelihood = -2.60 Transmembrane 216 - 232 ( 213 - 232) 

— -- Final Results 

bacterial membrane Certainty=0 . 6010 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 10075> which encodes amino acid sequence <SEQ ID 
10076> was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAF03934 GB:AF139908 membrane protein homolog [Listeria 
monocytogenes] 

Identities = 82/222 (36%) , Positives = 133/222 (58%) , Gaps = 4/222 (1%) 

Query: 44 PMANLITYFAQHQGLGFGVAIIIVTVIVRWILPLGLYQSWKASYQAEKMAYFKPLFEPI 103 

P + I + A+ G +G+AIII T+++R +I+PL L + KMA KP + I 

Sbjct: 3 PFTSFIMFVAKWGGNYGIAIIITTLLIRALIMPLNLRTAKAQMGMQSKMAVAKPEIDEI 62 
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Query: 104 NERLRNAKTQEEKLAAQTELMTAQREKGLSMFGGIGCLPLLIQMPFFSAIFFAARYTPGV 163 

RL+ A ++EE+ Q E+M + ++ +GCLPLLIQMP A ++A R + + 
Sbjct: 63 QARLKRATSKEEQATIQKEMMAWSKXNINPMQ-MGCLPLLIQMPILMAFYYAIRGSSEI 121 

Query: 164 SSATFLGimGQKSLTLlVIIAILYFVQSWX.SMQGVPDEQRQQMKTMMYLMPlMMVFMSI 223 

+S TFL NLG + L +1 ++Y Q ++SM G EQ++QMK + + P1M++F+S 
Sbjct: 122 ASHTFLWFNLGSPDMVLAIIAGLVYIACYF\'SM:GYSPEQKKQMKIIGLMSPIMILFVSF 181 

Query: 224 SLPASVALYWFIGGIFSIIQQLVT- -TYVLK- PKLRRKVEEE 262 

+ P+4+ALYW +GG+F Q L+T Y+ K P+++ +EE 
Sbjct: 182 TAPSALALYWAVGGLFLAGQTLLTKKLYMNKHPEIKVMEQEE 223 

A related DNA sequence was identified in S.pyogenes <SEQ ID 4789> which encodes the amino acid 
sequence <SEQ ID 4790>. Analysis of this protein sequence reveals the following: 

Possible site: 31 

>» May be a lipoprotein 

INTEGRAL Likelihood = -9.55 Transmembrane 62 - 78 ( 54 - 82) 

INTEGRAL Likelihood = -2.81 Transmembrane 178 - 194 ( 177 - 195) 

INTEGRAL Likelihood = -0.90 Transmembrane 216 - 232 ( 215 - 232) 

Final Results 

bacterial membrane --- Certainty=0 . 4821 (Affirmative) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:AAF03934 GB:AF139908 membrane protein homolog tliisteria monocytogenes] 
Identities = 89/218 (40%), Positives = 132/218 (59%), Gaps = 2/218 (0%) 

Query: 43 KPMSYFIDYFANNAGLGYGLAIIIVTIIVRTLILPLGLYQSWKASYQSEKMAFLKPVFEP 102 

+P + FI + A G YG+AIII T+++R LI+PL L + KMA KP + 

Sbjct: 2 QPFTSFIMFVAKFVGGNYGIAIIITTLLIRALIMPLNLRTAKAQMGMQSKMAVAKPEIDE 61 

Query: 103 INKRIKQANSQEEKMAAQTELMAAQRAHGINPLGGIGCLPLLIQMPFFSAMYFAAQYTKG 162 

I R+K+A S+EE+ Q E+MA + INP+ +GCLPLLIQMP A Y+A + + 
Sbjct: 62 IQARLKRATSKEEQATIQKEMMAVYSKYNINPMQ-MGCLPLLIQMPILMAFYYAIRGSSE 120 

Query: 163 VSTSTFMGIDLGSRSLVLTAIIAALYFFQSWLSNE*IAVSEEQREQMICTMMYTMPIMMIFMS 222 

+++ TF+ +LGS +VL I +Y Q ++SM+ S EQ++QMK + PIM++F+S 
Sbjct: 121 IASHTFLWFNLGSPDMVLAIIAGLVYLAQYFVSMIGYSPEQKKQMKIIGLMSPIMILFVS 180 

Query: 223 FSLPAGVGLYWLVGGFFSI IQQLITTYLLKPRLHKQI K 260 

F+ P+ + LYW VGG F Q L+T L + H +IK 
Sbjct: 181 FTAPSALALYWAVGGLFLAGQTLLTKKLYMNK-HPEIK 217 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 203/309 (65%) , Positives = 254/309 (81%) , Gaps = 2/309 (0%) 

Query: 1 MKKTLKRILFSSLSLSMLLLLTGCTSVDKAGKPYGVIWNTLGVPMANLITYFAQHQGLGF 60 

+K TL RILFS L+LS+LL LTGCV D G P G+IW LG PM+ I YFA + GLG+ 
Sbjct: 1 LKLTLNRILFSGLALSILLTLTGCVGRDAHGNPKGMIWEFLGKPMSYFIDYFANNAGLGY 60 

Query: 61 GVAIIIVWIVRWILPLGLYQSWKASYQAEKMAYFKPLFEPINERLRNAKTQEEKLAAQ 120 

G+AIIIVT+IVR +ILPLGLYQSWKASYQ+EKMA+ KP+FEPIN+R++ A +QEEK+AAQ 
Sbjct: 61 GLAIIIVTIIVRTLILPLGLYQSWKASYQSEKMAFLKPVFEPINKRIKQANSQEEKMAAQ 120 

Query: 121 TELMTAQRENGLSMFGGIGCLPLLIQMPFFSAIFFAARYTPGVSSATFLGLNLGQKSLTL 180 

TELM AQR +G++ GGIGCLPLLIQMPFFSA++FAA+YT GVS++TF+G++LG +SL L 
Sbjct: 121 TELMAAQRAHGINPLGGIGCLPLLIQMPFFSAMYFAAQYTKGVSTSTFMGIDLGSRSLVL 180 

Query: 181 TVIIAILYFVQSVttSMQ^VPDEQRQ^MKTtMYLMPIMMVFMSISLPASVALYWFIGGIFS 240 

T IIA LYF QSWLSM V +EQR+QMKTMMY MPIMM+FMS SLPA V LYW +GG FS 
Sbjct: 181 TAIIAALYFFQSWLSI#1AVSEEQREQJ4KTMMYTMPIMMIFMSFSLPAGVGLYWLVGGFFS 240 
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Query: 241 IIQQLVTTYVLKPKLRRKOTEEOTKNPPKAYK^^ 300 

IIQQL+TTY+LKP+L ++++EEY KNPPKAY++ ++RKDVT S ++N + K+N 
Sbjct: 241 IIQQLITTYLLKPRLHKQIKEEYAKNPPXAYQSTSSRKDVTPSQNMEQAN--LPKKIKSN 298 

5 

Query: 301 RNAGKQKRR 309 

RNAGKQ++R 
Sbjct: 299 RNAGKQRKR 307 

10 A related GBS gene <SEQ ID 8841> and protein <SEQ ID 8842> were also identified. Analysis of this 
protein sequence reveals the following: 

Lipop: Possible site: 23 Crend: 6 
McG: Discrim Score: 8.74 
GvH: Signal Score (-7.5): -1.47 
15 Possible site: 16 

>» May be a lipoprotein 

ALOM program count: 4 value: -12.52 threshold: 0.0 

INTEGRAL Likelihood =-12.52 Transmembrane 60 - 76 ( 54 - 83) 
INTEGRAL Likelihood = -3.66 Transmembrane 178 - 194 ( 177 - 196) 
20 INTEGRAL Likelihood = -2.76 Transmembrane 140 - 156 ( 137 - 157) 

INTEGRAL Likelihood = -2.60 Transmembrane 216 - 232 ( 213 - 232) 
PERIPHERAL Likelihood = 0.74 235 
modified ALOM score: 3.00 

25 *** Reasoning Step: 3 



Certainty=0. 6010 (Affirmative) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 
bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

37.9/63.7% over 193aa 

Bacillus subtilis 

EGAD | 45886 | hypothetical 3 0.7 kd lipoprotein in glnq-ansr intergenic region precursor 
Insert characterized 

SP|P54544|YQJG_BACSU HYPOTHETICAL 30.7 KDA LIPOPROTEIN IN GLNQ-ANSR INTERGENIC REGION 
PRECURSOR. Insert characterized 

GP| 1303958 |dbj |BAA12613.l| |D84432 YqjG Insert characterized 

GP|2634823|emb|CAB14320.l| |Z99116 similar to lipoprotein SpoIIIJ-like Insert 
' characterized 

PIR|G69963|G69963 lipoprotein SpoIIIJ-like homolog yqjG - Insert characterized 
ORF02470(478 - 1038 of 1530) 

EGAD| 45836 |BS2384 (63 - 256 of 275) hypothetical 30.7 kd lipoprotein in glnq-ansr intergenic 
region precursor {Bacillus subtilis}SP | P54544 | YQJG_BACSU HYPOTHETICAL 30.7 KDA LIPOPROTEIN 
IK GLNQ-ANSR INTERGENIC REGION PRECURSOR. GP 1 1303958 | dbj |BAA12613 . 1 1 | D84432 YqjG {Bacillus 
Subtilis}GP|2634823|emb|CAB14320.l| | Z99116 similar to lipoprotein SpoIIIJ-like {Bacillus 
subtilis}PIR|G69963 |G69963 lipoprotein SpoIIIJ-like homolog yqjG - Bacillus subtilis 
%Match =13.0 

%Identity =37.9 %Similarity =63.7 

Matches = 72 Mismatches = 65 Conservative Sub.s = 49 

252 282 312 342 372 402 432 462 

FCGSIV*FLKKK*NR*VY*KLEELKTLKKTLKRILFSSLSLSMLLLLTGCVSVDKAGKPYGVIWTLGVPMANLITYFAQ 

MLKTYQKLLAMGIFLIVLCSGNAAFAATNQVGGLSNVGFFHDYLIEPFSALLKGVAG 



HQGLGFGVAIIIVWIWWILPLGLYQSWKASYQAEWIAYFKPLFEPINERLRNAKTQSEKLAAQTELMTAQRENGLSM 

= |::|| = l|:||| = l = lll = I I I I I I II = I =1= I I- I 1 = 1 =!= = = 

LFHGEYGLSIILVTIIWIVVLPLFVNQFKKQRIFQEraiAVIKPQVDSIQvKLKKTKDPEKQKELQMEMMKLYQEHNINP 



65 
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732 762 792 822 852 894 918 

FGGIGCLPLLIQMPFFSAI FFAARYTPGVSSATFLGIjNLGQKSLTLTVI IAILYFVQSW LSMQ--GVPDE--QRQQ 

:|||| = lll I :::| I II -I =11 -III = : = = |>||||>> II : II =1 
-LAMGCLPMLIQSPIMIGLYYAIRSTPEIASHSFLWFSLGQSDILMSLSAGIMYFVQAYIAQKLSAKYSAVPQNPAAQQS 
5 150 160 170 180 190 200 210 



948 978 1008 1038 1068 1098 1128 1158 

MKTMMYLMPIMIWFMSISLPASVAIiYWFIGGIFSIIQQLVTTY 
I 1 = :: hi |: = = l|:: I I I I |>| =1 =1 = ! I 

10 AKLMVFIFPWMTIFSLWPAALPLYWFTSGLFLTVQNIVLQMTHHKSKKTAALTESVK 
230 240 250 260 270 



37.2/52.0% over 220aa 
Listeria monocytogenes 
15 GP| 6117974 | membrane protein homolog Insert characteri2ed 

ORF02470(430 - 1086 of 1530) 

GP|6117974|gb|AAF03934.l|AF139908_4|AF139908(3 - 223 of 237) membrane protein homolog 
{Listeria monocytogenes} 
20 %Match =14.6 

%Identity =37.1 %Similarity =62.0 

Matches = 32 Mismatches = 81 Conservative Sub.s = 55 

285 315 345 375 405 435 465 495 

25 K*NR* VY* KLEELKTLKKTLKRILFSSLSLSMLLLLTGCVSVDKAGKPYGVIWNTLGVPMAl^ITYFAQHQGLGFGVAI I 

I . » I : I « I =1=111 
IQPFTSFIMFVAKFVGGNYGIAII 
10 20 



30 525 555 585 615 645 675 705 735 

IVTVIWWILPLGLYQSWKASYQAEKmYFKP^ 

I |:::| =1=11 I = III II = I 11= I ==ll= I 1=1 = == =1111111 

ITTLLIRALIMPLNLRTAKAQMGMQSKMAVAKPEIDEIQARLKRATSKEEQATIQKEMMAWSKYNINP-MQMGCLPLLI 
40 50 60 70 80 90 100 

35 

765 795 825 855 885 915 945 975 

QMPFFSAIFFAARYTPGVSSATFLGimiGQKSLTLWIIAILYFVQSWLSMQGVPDEQRQQMKTMMYLMPIMMVFMSISL 
Ml : I = = l I = = = l III =111 = I =1 = = l = I = = ll I ll = = lll = = lll==l=l = 
QMPILMAFYYAIRGSSEIASHTFLWFNLGSPDMVLAIIAGLVYIAQYFVSMIGYSPEQKKQMKIIGLMSPIMILFVSFTA 
40 120 130 140 150 160 170 180 

1005 1035 1086 1116 1146 1176 1206 

PASVALYWFIGGI FS 1 IQQLVTT- - YVLK- PKLRRKVEEEYTKNPPKAYKANNARKDVTNSTKATESNQAI ITSKKTNRN 
|:::|]|| =11=1 I 1=1 h I l=== =11 
45 PSAIALYWAVGGLFIAGQTLLTKKLYMNKHPEIKVM3QEEKEFEQIVE3QKKEK 
200 210 220 230 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

50 Example 1552 

A DNA sequence (GBSxl644) was identified in S.agalactiae <SEQ ID 4791> which encodes the amino 
acid sequence <SEQ ID 4792>. This protein is predicted to be amino acid ABC transporter, permease 
protein. Analysis of this protein sequence reveals the following: 

Possible site: 48 
55 »> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -9.98 Transmembrane 32 
INTEGRAL Likelihood = -9.18 Transmembrane 195 
INTEGRAL Likelihood = -8.70 Transmembrane 72 



- 48 ( 23 - 53) 

- 211 { 189 - 213) 

- 88 ( 62 - 93) 



■ Final Results 

bacterial membrane Certainty=0 . 4991 (Affirmative) • 

bacterial outside Certainty=0 . 0000 (Not Clear) < i 

bacterial cytoplasm Certainty=0. 000O (Not Clear) < ! 
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The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB12131 GB:Z99105 similar to amino acid ABC transporter 
(permease) [Bacillus subtilis] 
Identities = 115/217 (53%) , Positives = 168/217 (76%) 

INWDAI FNLELAVKAFPSVIQGLPYTIGLSLVGFILGAI VGFFVALMKMSHFRLLRYLAN 6 1 
I W+ I FN +LA+++FP VI+G+ YT+ +S V G 4+G F++L +MS LLR+ A 
IQWEYIFOTKLAIESFPYVIKGIGYTLLISFVSMFAGTVIGLFISLARMSKLALLRWPAK 64 

IHISLMRGIPLMVLLFLIYFGLPFIGIQLDAVTASIVGFTNMSSAYISEI IRAALLAVDH 121 
++IS MRG+P++V+LF++YFG P+IGI+ AVTA+++GF++ S+AYI+EI R+A+ +V+ 
LYISFMRGVPILVILFILYFGFPYIGIEFSAOTAALIGFSLNSAAYIAEINRSAISSVEK 124 



GQWEAA +LGL RGII+PQ+ RIALP L+NVLLD++K+SSL AMITVP++ +AK 



I+GG DYMT YIL ALIYW IC++ A+ Q+ EK+ 
I IGGREFDYMTMYILTALIYWAICSIAAVFQNILEKK 221 

A related DNA sequence was identified in S.pyogenes <SEQ ID 4793> which encodes the amino acid 
sequence <SEQ ID 4794>. Analysis of this protein sequence reveals the following: 

Possible site: 23 
>>> Seems to. have an uncleavable N-term signal seq 

INTEGRAL Likelihood = -6.79 Transmembrane 1B6 - 202 ( 184 - 205) 

INTEGRAL Likelihood = -5.84 Transmembrane 26 - 42 ( 21 - 43) 

INTEGRAL Likelihood = -4.78 Transmembrane 57 - 73 ( 56 - 84) 

INTEGRAL Likelihood = -1.59 Transmembrane 86 - 102 ( 86 - 103) 





2 


Sbjct: 


5 




62 


Sbjct: 


65 




122 


Sbjct: 


125 




182 


Sbjct: 


185 



Final Results 

bacterial membrane Certainty=0 .3718 (Affirmative) • 

bacterial outside Certainty=0. 0000 (Not Clear) < 1 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < 1 

The protein has homology with the following sequences in the databases: 

>GP:CAB12131 GB:Z99105 similar to amino acid ABC transporter 
(permease) [Bacillus subtilis] 
Identities = 113/214 (52%) , Positives = 157/214 (72%) 





1 


Sbjct: 


10 




61 


Sb j Ct : 






121 


Sbj ct : 


130 






Sbjct: 





A +LGL Y ++ IILPQ+ RIA+PPL NV-H-D++K+SSLAAMITVP++ Q+AKIIGGR 
ASSLGLSYWQTMRGIILPQSIRIALPPIiANVLIiDLIKASSLAAMITVPELLQHAKI IGGR 189 

EWDYMSMYILVAFIYWLIAFLLERYQEFLENKLA 214 
E+DYM+MYIL A IYW I + +Q LE K A 
EFDYMTMYILTALIYWAICSIAAVFQNILEKKYA 223 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 110/213 (51%) , Positives = 156/213 (72%) 

Query: 7 IFNLELAVKAFPSVIQGLPYTIGLSLVGFILGAIVGFFVALMKMSHFRLLRYLANIHISL 66 

+ N+ L + V+ GLPYT+G+SL+ F G +G +AL+ S L+ YL +IS+ 
Sbjct: 1 MINIPLMKDSLGFVLSGLPYTLGISLLSFFTGLFLGLGLALLGRSRQPLIHYLVRAYISI 60 
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Query: S7 MRGIPLMVLLFLIYFC3LPFIGIQLDA\TASIVGFTMMSSAYISEIIRAftLIAVDHGQWEA 126 

MRG+P++V+LF++YFGLP+ G++L A+ + +GF+M+S+AYISE+ R+++ A+D GQWEA 
Sbjct: '61 MRGVPMIWLFVLYFGLPYYGLELPALLCAYL3FSM\ r SAAYISEVFRSSIEAIDKGQWEA 120 

Query: 127 ARALGLKTPTIYRGIIIPQATRIALPSLSNVLIiDMVKSSSLTAMITVPDIFNNAKIVGGT 186 

A+ALGL + + II+PQA RIA+P L NV++DMVKSSSL AMITVPDIF NAKI+GG 
Sbjct: 121 AKALGLPYALMVKKIILPQAFRIAVPPLGNVIIDMVKSSSLAAMITVPDIFQNAKIIGGR 180 

Query: 187 YSDYMTAYILVALIYWVI CTLYAI IQDWWEKRIi 219 

DYM+ YILVA IYW+I h Q++ E +L 
Sbjct: 181 EWDYMSMYILVAFIYWLIAFLLERYQEFLENKL 213 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1553 

A DNA sequence (GBSxl645) was identified in S.agalactiae <SEQ ID 4795> which encodes the amino 
acid sequence <SEQ ID 4796>. Analysis of this protein sequence reveals the following: 

Possible site: 13 

>>> May be a Lipoprotein. 

Final Results 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB12132 GB-.Z99105 similar to amino acid ABC transporter 
(binding protein) [Bacillus subtilie] 
Identities = 127/276 (46%), Positives = 183/276 (66%), Gaps = 12/276 (4%) 

Query: 3 KTILLGLVGLSAI4TLAACS--NGQSSKETTWDNIKKDGVLKVATPATLYPTSYYDDHK-- 58 

K ++ + LAACS N SK+T W+ IK G + VAT TLYPTSY+D 

Sbjct: 8 KAVIFSFTMAFFLILRACSGKNEADSKDTGWEQI KDKGKIWATSGTL7YPTSYHDTDSGS 67 



Query: 59 -K1TGYEIDMMKAIAKKLKIKVKFVEVGVAESFTSVDSGKVDVAVN 

KLTGYE4++++ AK+L +KV+F E+G+ T+V+SG+VD A N+ D T +R +K+ F 
Sbjct: 68 DKLTGYEVEVTOEAAKRLGLKVEFKEMGIEC-MIjTAVNSGQVDAAANDIDVTI<DREEKFAF 127 

Query: 118 SQPYKYSVGGMIVRADGSSKITAKDLSDWKGKKAGGGAGTQYMKIAKQQGAEPVIYDNVT 177 

S PYKYS G IVR D SI K L D KGKKA GAT YM++A++ GA+ VIYDN T 
Sbjct: 128 STPYKYSYGTAIVRKDDLSGI - -ICTLKDLKGKKAAGAATTVYMEVARKYGAKEVIYDNAT 185 

Query: 178 NDVYLRDVSTGRTDFI PNDYYTQVI AVKYVTKQYPD I KVKM - GDVKYNPTECGIVMSKKD 236 

N+ YL+DV+ GRTD I NDYY Q +A+ +PD+ + + D+KY P +Q +VM K + 

Sbjct: 186 NEQYLKDVANGRTDVILNDYYLQTLAL AAFPDLNITIHPDIKYMPNKQALVMKKSN 241 

Query: 237 KSLKTKIDAAIKDMKKDGSLKKISEKYYAGQDLTKE 272 

+L+ K++ A+K+M KDGSL K+S++++ D++K+ 
Sbjct: 242 AALQKKMNEALKEMSKDGSLTKLSKQFFNKADVSKK 277 

There is also homology to SEQ ID 1 190. 

SEQ ID 4796 (GBS183) was expressed in E.coli as a His-fusion product. SDStPAGE analysis of total cell 
extract is shown in Figure 26 (lane 2; MW 33kDa). 

GBS183-His was purified as shown in Figure 199, lane 7. 
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Based on this analysis, it was predicted that this protein and its epitopes, could he useful antigens for 
vaccines or diagnostics. 

Example 1554 

A DNA sequence (GBSxl646) was identified in S.agalactiae <SEQ ID 4797> which encodes the amino 
acid sequence <SEQ ID 4798>. Analysis of this protein sequence reveals the following: 



3 N- terminal signal £ 



Final Results 

bacterial cytoplasm Certainty=0 . 1514 (Affirmative) < suco 

bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 
bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAF0982l GB:AE001885 6-aminohexanoate-cyclic-dimer hydrolase 
[Deinococcus radiodurans] 
Identities = 178/488 (36%), Positives = 265/488 (53%), Gaps = 17/488 (3%) 

5 DATAMVQAIKQHKISSQELVEQAIYKIEEQNVSWNAWSKQYNEARQAAKYANESNA--- 61 

DA + Q ++ ++S++++ AI++ + NV++NAW Y++ A+ + + A 
54 DALDIAQLFRRGELSAEDMCTAAIHRAQVVNVALNAVVYPLYDQGIAQARATDAARARGE 113 

62 PFAGVPILLKDLGQNQKGQLSTSGSQLFKHYHAKQTDYLVQSFEKLGFIILGRTNT 117 

PFAGVP L+KD G G T G++ ++ + D LV+ ++ G + LG+TNT 
114 QATGPFAGVPFLVKDFGSRLAGVPHTGGTRAYRDQIPEWDDELVRRWQAAGLLPLGKTNT 173 

118 PEFGFKNISDGQLHGNVNLPFDHSRNAGGSSGGAAAAVSSGMVPIAGASDGGGSIRIPAS 177 

PEF +++ +LHG P+D R GGSSGG+A+AV++G+VP+AGA DGGGSIRIPAS 
174 PEFALMGVTEPELHGPTRNPWDLGRTPGGSSGGSASAVAAGIVPLAGAGDGGGSIRIPAS 233 

: 178 FNGtilGLKPSRGRIPVGPSSYRGWQGASSHFALTKSVRDTKRLLYYLQSYQVES PF 233 

GL GLKPSRGR+P G WQGA+ LT+SVRD+ LL Q + P 

Sbjct: 234 CCGLFGLKPSRGRVPCGDGVGEPWQGAAVEHVLTRSVRDSAALLDLEQGPDAGAALFLPS 293 

Query: 234 PLKmSKESLFEFSVSKPLKmVLMDSPLKTKVSSEAKAAIKEAADFLSQKGNHLELVEQ 293 

P + S+E E L+I PL V E AA++ AA L G+ +E V 

Sbjct: 294 PERPYSEEVGRE PGRLRIGFSTAHPIJSBSWPECTARVQGAARLLESLGHEvEEVAL 350 

Query: 294 PLDGIHSMKTYCMMNSVETAAMFDDIEKSLGRSMEFSDMELMTWAMYQSGQRVLAKDYSK 353 

P DG + + M+ ET A + +LGR SD+E +TW + Q G+ A D++ 
Sbjct: 351 PWDGPALAQAFLMLYFGETGASLAALRDTKjRPARASD^/EAVTWLLGQLGRSYSAADFAA 410 

'' Query: 354 LLDSWDQFAATMARFHENYDL I LTAATNQPAP FHGQFD LDETLQKQLRHMGEFSVSE 410 

SW+ A M RFH+NYDL+LT P G+ +L++M + 

Sbjct: 411 ARASWNVHARAMGRFHQNYDLLLTPVLATPPLQIGELQPRGVQAALLRAAQQMDVSGLLR 470 

Query: 411 QQDLIWKMFEDSMAWTPFTHQPNLTGQPSLAIPTHLTKEGLPLGVQLTAAKGREDLLLAV 470 

+ + + D + P+T NLTGQP++++P H T +GW+GVQ A RED+LL + 
Sbjct: 471 RSGQVDAIATDILEKMPYTQLANIiTGQPAMSVPIiHWTADGDPVGVQFVAPLAREDVLLRL 530 

Query: 471 AELFEKEK 478 

A E+ + 
Sbjct: 531 AGQLEQAR 538 

A related DNA sequence was identified in S.pyogenes <SEQ ID 4047> which encodes the amino acid 
sequence <SEQ ID 4048>. Analysis of this protein sequence reveals the following: 



- Final Results - 
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bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty= 0.0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 277/484 (57%) , Positives = 348/484 (71%) , Gaps = 2/484 (0%) 

Query: 1 MVFKDATAMVQAIKQHKISSQELVEQAiyKIEEQWSVNAWSKQVNEARQAAKYANESN 60 

M ++DATAM A++ + + ELV QAIYK ++ N ++NA+ S+++ A + AK + S 
Sbjct: 1 MTYQDATAMAIAVQTGQTTPLELOTQAIYKAKKLNPTLNAITSERFEAALEEAKQRDFSG 60 

Query: 61 APFAGVPILLKDLGQNQKGQLSTSGSQLFKHYHAKQTDYLVQSFEKLGFIILGRTNTPEF 120 

PFAGVP+ LKDLGQ KG STSGS+LFK Y A +TD V+ E LGFI ILGR+NTPEF 
Sbjct: 61 LPFAGVPLFLKDLGQELKGHSSTSGSRLFKEYQATKTDLFVKRLEALGFIILGRSNTPEF 120 

Query: 121 GFKNISDGQLHGNVNLPFDHSRNAGGSSGGAAAAVSSGMVPIAGASDGGGSIRIPASFNG 180 

GFKNISD LHG VNLP D++RNAGGSSGGAAA VSSG+ +A ASDGGGSIRIPASFNG 
Sbjct: 121 GFKNISDSSLHGPVNLPRDNTRNAGGSSGGAAALVSSGISALATASDGGGSIRIPASFNG 180 

Query: 181 LIGLKPSRGRIPVGPSSYRGWQGASSHFALTKSWD7KRLLYYLQSYQVESPFPLKKLSK 240 

LIGLKPSRGR+PVGP SYR WQGAS HFALTKSVRDT+ LLYYLQ Q+ESPFPL L+K 
Sbjct: 181 LIGLKPSRGRMPVGPGSYRSWQGASVHFALTKSVRDTRNLI.YYLQMEQMESPFPLATLTK 240 

Query: 241 ESLFEFSVSKPLKIAVLMDSPLKTKVSSEAKAAIKEAADFLSQKGNHL-ELVEQPLDGIH 299 

+S+++ S+ +PL IA + VS + A+++A +L ++G+ L EL E P++ 

Sbjct: 241 DSIYQ-SLQRPLTIAFYQRLSDGSPVSLDTAKALRQAVTWLREQGHQLVELEEFPVNMTE 299 

Query: 300 SMKTYCWMSVETAAMFDDIEKSLGRSMEFSDIvELMTWAMYQSGQRVIAKDYSKLLDSWD 359 

++ Y +MNSVETAAMF DIE + GR M DME MTWA+YQSG+ + A YS++L WD 
Sbjct: 300 VIRHYYITOSVETARMFADIEDTFGRPMTKDDMETMTWAIYQSGKDIPAWRYSQVIiQKWD 359 

Query: 360 QFAATMARFHENYDLILTAATNQPAPFHGQFDLDETLQKQLRHMGEFSVSEQQDLIWKMF 419 

++ATMA FHE YDL+LT TN PAP HG+ DLL PS EQ +L4 MF 

Sbjct: 360 TYSATMASFHETYDLLLTFTTNTPAPKHGELVPDSKLMANLAQAEIFSSEEQFNLVETMF 419 

Query: 420 EDSMAWTPFTHQPNLTGQPSLAIPTHLTKEGLPLGVQLTAAKGREDLLLAVAELFEKEKQ 479 

S+A P+T PNLTGQP++++PT+ TKEGL +G+QL AAKGREDLLL +AE FE 
Sbjct: 420 GKSLAINPYTALPNLTGQPAISIiPTYETKEGLSMGIQLIAAKGREDLLLGIAEQFEAAGL 479 

Query: 480 FKGP 483 
K P 

Sbjct: 480 LKIP 483 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1555 

A DNA sequence (GBSxl647) was identified in S.agalactiae <SEQ ID 4799> which encodes the amino 
acid sequence <SEQ ID 4800>. This protein is predicted to be transcription elongation factor (greA). 
Analysis of this protein sequence reveals the following: 



■ Final Results 

bacterial cytoplasm Certainty=0. 5003 (Affirmative) 

bacterial membrane Certainty=0. 0000 (Not Clear) < ; 

bacterial outside Certainty=0 . 0000 (Not Clear) < ; 



The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB14674 GB:Z99117 transcription elongation factor [Bacillus subtilis] 
Identities = 86/154 (55%) , Positives = 114/154 (73%) , Gaps = 1/154 (0%) 
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Query: 3 EKTYPMTQVEKDQLEKELEELKLVRRPEWERIKIARSYGDLSENSEYDAAKDEQAFVEG 62 

EK +PMT K +LE+ELE LK V+R EVVERIKIARS+GDLSENSEYD+AK+EQAFVEG 
Sbjct: 4 EKVFPMTAEGKQKLEQELEYLKTVKRKEWERIKIARSFGDLSENSEYDSAKEEQAFVEG 63 

Query: 63 QIQILETKIRYAEIIDSDAVAKDEVAIGKTVLVQEVGTNDKDTYHIVGAAGADIFSGKIS 122 

++ LE IR A+II+ D + V +GKTV E+ D+++Y IVG+A AD F GKIS 
Sbjct: 64 RVTTLENMIRNAKIIEDDG-GSNWGLGKTVTFVELPDGDEESYTIVGSAEADPFEGKIS 122 

Query: 123 NESPIAHALIGKKTGDLATIESPAGSYQVE1ISV 156 

N+SPIA +L+GKK + T+++P G V+I+ + 
Sbjct: 123 NDSPIAKSLLGKKVDEEVTVQTPGGEMLVKIVKI 156 

A related DNA sequence was identified in S.pyogenes <SEQ ID 4801> which encodes the amino acid 
sequence <SEQ ID 4802>. Analysis of this protein sequence reveals the following: 

3 N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .4434 (Affirmative) < suco 

bacterial membrane --- Certainty=0. 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 145/160 (90%) , Positives = 149/160 (92%) 

Query: 1 MAEKTYPMTQVEKDQLEKELEELKLVRRPEWERIKIARSYGDLSENSEYDAAKDEQAFV 60 

MAEKTYPMT EK-K)LEKELEELKLVRRPE+VERIKIJffiSYGDIiSENSEYDAAKDEQAFV 
Sbjct: 17 MAEKTYPMTLTEKEQLEKELEELKLVRRPEIVERIKIARSYGDLSENSEYDAAKDEQAFV 76 

Query: 61 EGQIQILETKIRYAEIIDSDAVAKDEVAIGKTVLVQEVGTNDKDTYHIVGAAGADIFSGK 120 

EGQI LETKIRYAEI IDSDAVAKDEVAIGKTV+VQEVGT DKDTYHI VGAAGADI FSGK 
Sbjct: 77 EGQISTLETKZRYAEIIDSDAVAKDEvAIGKTVIVQEVGTTDKDTYHIVGAAGADIFSGK 136 

Query: 121 ISNESPIAHALIGKKTGDIATIESPAGSYQVEIISVEKTN 160 

ISNESPIA ALIGKKTGD IESPA +Y VEIISVEKTN 
Sbjct: 137 I SNES P I AQALIGKKTGDKVRI ES PAATYDVEI I SVEKTN 176 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1556 

A DNA sequence (GBSxl648) was identified in S.agalactiae <SEQ ID 4803> which encodes the amino 
acid sequence <SEQ ID 4804>. This protein is predicted to be aminodeoxychorismate lyase-like protein. 
Analysis of this protein sequence reveals the following: 

Possible site: 58 

?>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood =-13.64 Transmembrane 238 - 254 ( 230 - 260) 



Final Results 

bacterial membrane Certainty=0 . 6456 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAF77615 GB:AF151720 aminodeoxychorismate lyase-like protein 
[Streptococcus thermophilus] 
Identities = 135/210 (64%) , Positives = 171/210 (81%) 
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Query: 373 ICTTSTPYKADDFLKLVQDETFIKKMVAKYPinjIiGSIjPDKSKRIYQLEGYIjFPATYNYYKD 432 

K +ST K DFLKL++D+ FI KM AKYP LL +LP+ + A Y LEGYLFPATYN + D 
Sbjct: 5 KHSSTGLKEKDFLKiMKDDAFITKMKAKYPTLLAl^PNSTDAKYVLEGYLFPATYHIHDD 64 

Query: 433 TTLEGLVEDMISTMNTKl^PYYlSrTIKAKl^lSVHDVLTLSSLVEKEGSTDEDRRKIASVFY 492 

TT+E L E+M+ TM+T ++PYY TI + N +VN++LTL+SLVEKEG+TD+DR+ IASVFY 
Sbjct: 65 TTVESLAEEMLFTMDTHLSPYYATILSSNHNVNEILTLASLVEICEGATDDDRKNIASVFY 124 

Query: 493 NRLSAGQALQSNIAILYAMGKLGDKTSLAEDAQINTSIKSPYNIYTHTGLMPGPVDSPSI 552 

NRL++ ALQSNIA+LY +GKLG +T+L EDA I+T+I SPYN Y + GLMPGPVDSPS+ 
Sbjct: 125 OTIMSDMALQSNIAVLYVLGKLGQETTLKEDATIOT 184 

Query: 553 SAIEATIKPASTDYLYFVADVKTGNVYYAK 582 

SA1EA I P+ST Y+YFVADV TGNVY+A+ 
Sbjct: 185 SAIEAVINPSSTKYMYFVADVSTGNVYFAE 214 

A related DNA sequence was identified in S.pyogenes <SEQ ID 4805> which encodes the amino acid 
sequence <SEQ ID 4806>. Analysis of this protein sequence reveals the following: 

Possible site: 59 

>» Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -7.91 Transmembrane 161 - 177 ( 155 - 183) 

Final Results 

bacterial membrane Certainty=0. 4163 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:AAF77615 GB:AF151720 aminodeoxychorismate lyase-like protein 
[Streptococcus thermophilus] 
Identities = 135/212 (63%) , Positives = 161/212 (75%) 

Query: 295 KTKKAKTPFISIEKEFLDLVTDEAFIQDMVKRYPKLLATIPTKEKAIYRLEGYLFPATYNYY 354 

K K + T EKDFL L+ D+AFI M +YP LLA +P AY LEGYLFPATYN + 
Sbjct: 3 KGKHSSTGLKEKDFLKLMKDDAFITKMKAKYPTLIANLPNSTDAKYVLEGYLFPATYNIH 62 

Query: 355 KETTMRELVEDMLAAM3ATLVPYYDKIAASGKTVNEVLTLASLVEKEGSTDDDRRQIASV 414 
• ■ ■ +TT+ L E+ML MD L PYY I +S VNE+LTLASLVEKEG+TDDDR+ IASV 

Sbjct: 63 DDTTVESIAEEMLFTMDTHLSPYYATILSSNHNVNEILTLASLVEKEGATDDDRKNIASV 122 

^ FYNRLNS MALQSNIA+LY +GKLG4 +TTL 3DATIDT I+SPYN Y + GLMPGPV S 

Sbjct: 123 FYNRLNSDMALQSNIAVLYVLGKLGQETTLKEDATIDTNIDSPYNDYVHKGLMPGPVDSP 182 

Query: 475 GVSAIEATLNPASTDYLYFVANVHTGEVYYAK 506 

+SAIEA +NP+ST Y+YFVA+V TG VY+A+ 
Sbjct: 183 SLSAIEAVINPSSTKYMYFVADVSTGNVYFAE 214 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 310/603 (51%) , Positives = 403/603 (66%) , Gaps = 86/603 (14%) 

Query: 1 MTEFlTODQHSlfflDQKSFKSQIIAELEEMTOLRKIiREEELYQKEQEAKEAARRTAQLMADY 60 

+T+F D + Q+SFKEQILAELE+AN++RK +EEEL+ 
Sbjct: 3 LTDFKDKDQQDQ-QRSFKEQILAELEKANQIRKEKEEELF 41 

Query: 61 EAQRLKDEREARAKALETKQRLEEQEKARIEAKIjIAEAAREEERRQAEQALASQEEQVIN 120 

++ LE +E AR A+D AE R++ A Q+E + + 

Sbjct: 42 QKELEAKEAARRTAQLYAEYKRQD AFQKESIAH 74 

Query: 121 QGMEPSRELDSGSKSSEFRTTENVPDIDLKADKTDVATAVPNQETEEIFLVRATDIPTEG 180 

+T ++ +A K V T+ + T + +E 

Sbjct: 75 NN KTAKH FQAIKGAVMTSEALKPT- - -LLSEK 103 
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Query. 181 EWKLGEISELEPVAKEPIRVEDIiSKEEEGIMiS2iKNKHNKRER RQKADNVAKRIAR 237 

EN L ++ A E +++ + +E + L+ + H+ R + RQ+ + AK+I+ 
Sbjct: 104 ENSSLKTTNKRWQANE LQETA8KESQVPLTIEKGHSVRRKLSKRQQTERAAKKIST 160 

Query: 233 ILISIIUjVLLLTAFVGYRFVDSAIKPVDSNSNKFVQVEIPIGSGNKLIGQILEKAGVIK 297 

+LIS 11+ LL G +V SA+ PVD NS+ FVQVEIP GSGNKLIGQIL+K G+IK 

Sbjct: 161 VLISSIIITLLAVTLAGAGYVYSAMPVDI^NSDAFVQVEIPSGSGNKLIGQILQKKGLIK 220 

Query: 293 SATVERrfSKPKNYSNFQSGYYNl.KKSMTLDQIAAELEKGGTAEPTKPALG[aLITEGYT 357 

++TVF++Y+KFKN++NFQSGYYNL+KSM+L++IA+ L++GGTAEPTKP+LGKILI EGYT 
Sbjct: 221 NSTVFSFYTKFKNFTNFQSGYYNLQKSMSLEEIASALQEGGTAEPTKPSLGKILIPEGYT 280 

Query: 358 IKQIAKAIESN-KIDTKTTSTPYICADDFLKLVQDETFIKKMVAKYPNLLGSLPDKSKAIY 416 

IKQIAKA+E N K TK TP+ DFL LV DE FI+ MV +YP LL ++P K KAIY 
Sbjct: 281 IKQ1AKAVEHNSKGKTKKAICTPFNEKDFLDLVTDEAFIQDMVKRYPKLLATIPTKEKAIY 340 

Query: 417 QLEGYLFPATYNYYKDTTLEGLVEDMISTMNTKMAPYYNTIKAKMMSVNDVLTLSSLVEK 476 

+LEGYLFPATYNYYK+TT+ LVEDM++ M+ + PYY+ I A +VN4-VLTL+SLVEK 
Sbjct: 341 RLEGYLFPATYNYYKETTMRELVEDMLAAMDATLVPYYDKIAASGKTVNEVLTLASLVEK 400 

Query: 477 F^STDEDRRKIASVFYNRLSAGQALQSNIAILYAMGKLGDKTSLAEDAQINTSIKSPYNI 536 

EGSTD+DRR+IASVFYNRL++G ALQSNIAILYAMGKLG+KT+LAEDA I+T+I SPYNI 
Sbjct: 401 EGSTDDDRRQIASVFYNRIjNSGMALQSNIAILYAMGKLGEKTTLAEDATIDTTINSPYNI 460 

Query: 537 YTNTGLMPGPVDSPSISAIEATIKPASTDYLYFVADVKTGNVYYAKDFETHKANVEKYIN 596 

YTNTGLMPGPV S +SAIEAT+ PASTDYLYFVA+V TG VYYAK FE H ANVEKY+N 
Sbjct: 461 YTNTGLMPGPVASSGVSAIEATLNPASTDYLYFVANVHTGEVYYAKTFEEHSANVEKYVN 520 

Query: 597 SQI 599 
SQI 

'. Sbjct: 521 SQI 523 

A related GBS gene <SEQ ID 8843> and protein <SEQ ID 8844> were also identified. Analysis of this 
protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 8 
McG: Discrim Score: -17.88 
GvH: Signal Score (-7.5): -3.51 

Possible site: 58 
»> Seems to have no N- terminal signal sequence 
ALOM program count: 1 value: -13.64 threshold: 0.0 

INTEGRAL Likelihood =-13.64 Transmembrane 238 - 254 ( 230 - 250) 
PERIPHERAL Likelihood = 5.78 285 
modified ALOM score: 3.23 

*** Reasoning Step: 3 

Final Results 

bacterial membrane Certainty=0 . 5456 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

ORF0093K1417 - 2046 of 2400) 

GP|8574530|gb|AAF77615.l|AF151720_l|AF151720(5 - 214 of 214) aminodeoxychori striate lyase- 
like protein {Streptococcus thermophilus } 
%Match =17.5 

%Identity =64.3 %Similarity =81.4 

Matches = 135 Mismatches = 39 Conservative Sub.s = 35 

1236 1266 1296 1326 1356 1386 1416 1446 

NYYSKFKNYSNFQSGYYNLKKSMTLDQIAAELEKGGTAEPTKPALC-KILITEGYTIKQIAKAIESNKIDTKTTSTPYKAD 

I. =11 I 
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1476 1506 1536 1566 1596 1626 1656 1686 

DFLKLVQDETFIKKWAKYPNLLGSLPDKSKAIYQLEGYLFPA^ 

I! II llll II =11= = 1 I lllllllllll = 111 = 1 I 1 = 1= hi = = 111 II « I 
DFLKLMKDDAFITKMKAKYPTLLAmPNSTDAKY\n^^ 
5 30 40 50 60 70 80 90 

1716 1746 1776 1806 1836 1866 1896 1926 

SVM3VLTLSSLWKEGSTDEDRRKIASVFY^LSAGQALQSMIAILYAMGKLGDKTSLAEDAQim?SIKSPYNIYTNTGL 
:||-.-.|IMIIIII]:|l = lh milllll--: IIIIIIMI -.III! = I I III I = I « I llll I » II 
1 0 NVNEILTIASLVEKEGATDDDRKNIASVFYM^MSDMALQSNIAVL'YVLGKLGQETTLKEDATIDIW 

110 120 130 140 150 160 170 

1956 1986 2016 2046 2076 2106 2136 2166 

MPGPVDSPSISAIEATIKPASTDYLYFVADVKTGNVYYAKDFETHKANVEKYINSQIN*AYKHGASHHVYIFDLKK*KEK 

15 IIIMIIH I I 1 = 11 hllllll llllhh 

MPGPVDSPSLSAIEAVINPSSTKYMYFVADVSTGNVYFAE 
190 200 210 

SEQ ID 8844 (GBS370) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 64 (lane 6; MW 70kDa). 

20 GBS370-His was purified as shown in Figure 209, lane 10. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1557 

A DNA sequence (GBSxl649) was identified in S.agalactiae <SEQ ID 4807> which encodes the amino 
25 acid sequence <SEQ ID 4808>. Analysis of this protein sequence reveals the following: 
Possible site: 53 

>» Seems to have no N- terminal signal' sequence 

Final Results 

30 bacterial cytoplasm --- Certainty=0 . 0183 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 10077> which encodes amino acid sequence <SEQ ID 
35 10078> was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAA98889 GB:Z74367 ORF YDR071C [Saccharomyces cerevisiae] 
Identities = 52/174 (29%) , Positives = 81/174 (45%) , Gaps = 18/174 (10%) . 

40 Query: 27 MSMI IRNGCLEDLQQVI S I EQINFS3AEAAS KKAMQERLTI MTDT FLVAEINGR 80 

+ M IR +EDL+Q++++E F E AS++ + RL + + EI G+ 

jl IEDLKQILNLESQGFPPNERASEEI I SFRLINCPELCSGLFIREIEGKEVK 6 9 



Sbjct: 70 KETLIGHIMGTKIPHEYITIESMGKLQ VESSNHIGIHSWIKPEYQKKNLATLLLTD 126 

Query: 138 MKDLWSQE-RDGISLTCHDDLISFYEMNGFKDEGES DSKHGGSLWYNM 185 

+ +QE + I L H+ LI FYE GFK E+ D W +M 

Sbjct: 127 YIQKLSNQEIGNKIVLIAHEPLIPFYERVGFKIIAEirarVAKDKNFAEQKWIDM 180 

A related DNA sequence was identified in S.pyogenes <SEQ ID 4809> which encodes the amino acid 
sequence <SEQ ID 4810>. Analysis of this protein sequence reveals the following: 

I-terminal signal sequence 
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Final Results 

bacterial cytoplasm Certainty^ 0 . 2576 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 87/159 (54%), Positives = 117/159 (72%), Gaps = 1/159 (0%) 

- Query: 29 MIIRNGCLEDLQQVISIEQINFSEAEAASKKAMQERLTIMTDTFLVAEINGRLAGYIEGP 88 
M+IR DL+ + +IE NFS EA ++ ++E + ++ DTFLVA 1+ + GYIEGP 

Sbjct: 1 MLIRQVCGSDLEVIATIESDNFSPQEATTRAVLEEHIRLIPDTFLVALIDQEIVGYIEGP SO 

Query: 89 VIKGRYLTDDLFHKVSEFPVRVGGFIGITSLSIHPDFKGQGIGTALLAAMKDLWSQERD 148 

V+ L D LFH V++ P + GG+I ITSLS1 F+ QG+GTALLAA+KDLW+Q+R 
Sbjct: 61 vVTTPILEDSLFHGVTKNP-KTGGYIAITSLSIAKHFQQQGVGTALLAALKDLVVAQQRT 119 

Query: 149 GISLTCHDDLISFYEMNGFKDEGESDSKHGGSLWYNMIW 187 

G+ LTCHD LIS+YEMNGF ++G S+S+HGG+LWY MIW 
Sbjct: 120 GLILTCHDYLISYYEMNGFINQGISESQHGGTLWYQMIW 158 

Based on this analysis, it was predicted that these proteins and their epitopes could he useful antigens for 
vaccines or diagnostics. 

Example 1558 

. A DNA sequence (GBSxl650) was identified in S.agalactiae <SEQ ID 481 1> which encodes the amino 
acid sequence <SEQ ID 4812>. This protein is predicted to be udp-n-acetylmuramate-alanine ligase 
(murC/ddlA). Analysis of this protein sequence reveals the following: 
Possible site: 31 

>» Seems to have a cleavable N-term signal seq. 

INTEGRAL Likelihood = -2.60 Transmembrane 272 - 288 ( 270 - 288) 

Final Results 

bacterial membrane Certainty=0. 2041 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 .0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC00294 GB:AF008220 putative UDP-N-acetylmuramate-alanine 
ligase [Bacillus subtilis] , , 

Identities = 238/432 (55%) , Positives = 315/432 (72%) , Gaps = 3/432 (0%) 

YHFIGIKGSGMSAIALMLHQMGHNVQGSDVDKYYFTQRC-LEQAGVTILPFSPNNISEDLE 64 
YHF+GIKG+GMS LA +LH G+ VQGSD++K+ FTQ LE+ +TILPFS NI + 
YHFVGIKGTGMSPLAQILHDNGYTVQGSDIEKFI FTQTALEKRNITILPFSAENIKPGMT 63 



Query: 


5 


Sbjct: 




Query: 


65 


Sbjct: 


64 




125 


Sbjct: 


123 




185 


Sbj ct : 


183 


Query: 


245 


Sbj ct : 


243 



TSFLIGDGTG+G+ N+ YFVFEA EY RHF+ Y P+Y+I+TNIDFDHPDYF+ h 



+DVF+AF + A QV KG+ 



F++PAYG HN+LN+ AVIA + ID +++ LK+F GVKRRF E 
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Query: 305 IIDDWIIDDFAHHPTEIIATLDAARQKYPSKEIVAIFQPHTFTRTIALLDEFAHALSQA 364 

+ D V+IDD+AHHPTEI T+4AARQKYP +EIVA+FQPHTFTRT LDEFA +LS A 
Sbjct: 303 QLGDQVLIDDYAHHPTEIKVTIEAARQKYPDREIVAVFQPHTFTRTQQFLDEFAESLSGA 362 

Query: 365 DSWIAQIYGSAREVDNGEVKVEDLAAKIVKHSDLVTVEWSPLIiNHDNAVyVFMCSAGDI 424 

D VYL I+GSARE + G++ +■ DL KI ++ L+ ++ S L HD AV +FMGAGDI 
Sbjct: 363 DCVYLCDIFGSARE-NAGKLTIGDLQGKI-HNAKLIEEDDTSVLKAHDKAVLIFMGAGDI 420 

Query: 425 QLYERS FEELLA 436 

Q Y R++E ++A 
Sbjct: 421 QKYMRAYENVMA 432 

A related DNA sequence was identified in S. pyogenes <SEQ ID 4813> which encodes the amino acid 
sequence <SEQ ID 4814>. Analysis of this protein sequence reveals the following: 

Possible site: 31 



20 Final Results 

bacterial membrane --- Certainty=0. 2326 (Affirmative) < suco 
bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 
bacterial cytoplasm --- Certainty=0 . 0000 (Not Clear) < suco 

25 The protein has homology with the following sequences in the databases: 

>GP:AAC00294 GB:AF008220 putative UDP-N-acetylmuramate-alanine 
ligase [Bacillus subtilis] 
Identities = 236/431 (54%) , Positives = 310/431 (71%) , Gaps = 2/431 (0%) 

30 Query: 5 YHFIGIKGSGMSAIALMLHQMGHKVQGSDVEKYYFTQRGLEQAGITILPFSEDNITPDME 64 

YHF+GI KG+GMS LA +LH G+ VQGSD+EK+ FTQ LE+ ITILPFS +NI P M 
Sbjct: 4 YHFVG1KGTGMSPIAQILHDNGYTOQGSDIEKFIFTQTALEKRNITILPFSAENIKPGMT 63 

Query: 65 L,IVGNAFRENNKEVAYALRHQIPFKRYHDFLGDFMKSFISFAVAGAHGKTSTTGLLSHVL 124 
35 +i GNAF + + E+ A+ IP RYH FLGD+MX F S AV GAHGKTSTTGLL+HV+ 

Sbjct: 64 VIAGNAFPDTHPEIEKAMSEGIPVIRYHKFIi3DYMKKFTSVAWGAHGKTSTTGLLAHVI 123 

Query: 125 KNITDTSYLIGDGTGRGSANAQYFWESDEYERHFMPYHPEYSIITNIDFDHPDYFTGIA 184 
+N TS+LIGDGTG+G+ N++YFVFE+ EY RHF+ Y P+Y+I+TNIDFDHPDYF+ I 
40 Sbjct: 124 QNAKPTSFLIGDGTGQGNENSEYFVFFACEYRRHFLSYQPDYAIMTNIDFDHPDY'FSSID 183 

Query: 185 UVRNAF1TOYAKQVKKALFVYGEDDELKKIEAPAPIYYYGFEEGNDFIAYDITRTTNGSDF 244 

DV +AF + A QV K + G+D+ L KI A P+ YYG E NDF A +1 ++T G+ F 
Sbjct: 184 DVFDAFQE^IALQVNKGIIACGDDEHLPKIHAOTPWYYGTGEENDFQARNIVKSTEGTTF 243 

45 

Query: 245 KVKHQGEVIGQFHVPAYGKHNILNATAVIANLWAGIDMALVADHLKTFSGVKRRFTEKI 304 

v + F++PAYG HN+LN+ AVIA ID +++ LK+F GVKRRF EK 

Sbjct: 244 DVFVRNTFYDTFYIPAYGHHNVIjNSIAVZALC^EEIDSSIIKHALKSFGGVKRRFI^KQ 303 

50 Query: 305 INDTIIIDDFAHHPTEIVATIDAARQKYPSKEIVAIFQPHTFTRTIALLEDFACALNEAD 364 

+ D ++IDD+AHHPTEI TI+AARQKYP +3IVA+FQPHTFTRT L++FA +L+ AD 
Sbjct: 304 LGDQVLIDDYAHHPTEIKVTIEAARQKYPDREIVAVFQPHTFTRTQQFLDEFAESLSGAD 363 

Query: 365 SVY^QIYGSAREVDKGEVKVEDLAAKIIKPSQVVTVENVSPLLDHDITOVYVFMGAGDIQ 424 
55 , VYL I+GSARE + G++ + DL K I ++++ ++ S L HD AV +FMGAGDIQ 

Sbjct: 364 CWLCDIFGSARE-NAGKLTIGDLQGK-IHNAKLIEEDDTSVLKAHDKAVLIFMGAGDIQ 421 

Query: 425 LYEHSFEELLA 435 
Y ++E ++A 
60 Sbjct: 422 KYMRAYENVMA 432 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 369/443 (83%) , Positives = 406/443 (91%) , Gaps = 1/443 (0%) 
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1 


Sbjct: 


1 


Query: 
Sbjct: 


61 
61 


Query: 


121 


Sbj ct : 


120 


Query: 


181 


Sbj ct : 


180 


Query: 


241 


Sbjct: 


240 


Query: 


301 


Sbjct: 


300 




361 


Sbjct: 


360 




421 


Sbjct: 


420 



MSKTYHFIGIKGSGMSALALMLHQMGHNVQaSDVDKYYFTQRGLEQAGVTILPFSPNNIS 60 
MSKTYHFIGIKGSGMSAIALMLHQMGH VQGSDV+KYYFTQRGLEQAG+TTIiPFS +NI+ 
MSKTYHFIGXKGSGMSALALMLHQMGEKVQGSDVEKYYFTQRGLEQAGITILPFSEDNIT 6 0 



L+HVLKNITDTS+LIGDGTGRGSANA YFVFE +DEYERHFMPYHPEYSI ITNIDFDHPDY 



FTG+ DV NAFNDYAKQV+K LF+YGED +L +1 + APIYYYGFE+ NDFIA DITRT 



NGSDFKV + E IGQFHVPAYGKHNI LNATAVIANL+ + GIDMALVA+HLKTFSGVKRR 



FTEKII+DT+IIDDFAHHPTEI+AT+DAARQKYPSKEIVAIFQPHTFTRTIALL++FA A 



L++ADSVYLAQIYGSAREVD GEVKVEDLAAKI+K S +VTVENVS PLL+HDNAVYVFMG 



AGDIQLYE SFEELLANLTKN Q 
AGDIQLYEHSFEELLANLTKNNQ 442 

SEQ ID 4812 (GBS157) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 24 (lane 11; MW 49kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 31 (lane 8; MW 74kDa), Figure 33 
(lane 8; MW 74kDa) and Figure 37 (lane 3; MW 74kDa). 

The GBS157-GST fusion product was purified (Figure 112A; see also Figure 200, lane 3) and used to 
immunise mice (lane 1+2 product; 19.5ug/mouse). The resulting antiserum was used for Western hlot 
(Figure 112B), FACS, and in the in vivo passive protection assay (Table III). These tests confirm that the 
protein is immunoaccessible on GBS bacteria and that it is an effective protective immunogen. 

SEQ ID 4812 (GBS157) was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 183 (lane 11-13; MW 74kDa). 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1559 

A DNA sequence (GBSxl651) was identified in S.agalactiae <SEQ ID 4815> which encodes the amino 
acid sequence <SEQ ID 4816>. Analysis of this protein sequence reveals the following: 

o N- terminal signal sequence 

- Final Results 

bacterial cytoplasm Certainty=0. 1980 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < SU cc> 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 
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The protein has no significant homology with any sequences in the GENPEPT database. 

A related DNA sequence was identified in S.pyogenes <SEQ ID 4817> which encodes the amino acid 

sequence <SEQ ID 4818>. Analysis of this protein sequence reveals the following: 

Possible site: 25 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm — Certainty=0 .2731 (Affirmative) < suco 
bacterial membrane --- Certainty=0. 0000 (Not Clear) < suco 
bacterial outside — Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 80/201 (39%) , Positives = 126/201 (61%) , Gaps = 9/201 (4%) 

Query: 7 RFPLIADDEPVMSPLVIC^YDNEDLIITOIRDFYQEKTYQSMVKSNYEHEEISHPKVIEN 66 

+FPL+AD + P +M LY+NEDLI NIR +YQ+K Y + ++ EE + 
Sb^ct: 5 QFPLVADGIAISDPAKQMALYENEDLITNIRGYYQDKEYDDIAEN EEFTAKATSRQ 60 

Query: 67 DPVPPQ--SFVKKATELSKSRQFAKRSV3^KRQAYYAKQEFKAPSKEAFQQQLKATVPKK 124 

P + s +K + ++RQ+AK+ ++EKRQAY AK+ P + + +QQ + n . 
Sbjct: 61 ™- 



Query: 125 QTQRKUTELSHLSDRLQQESYILAEIPIIFQEPDNTPNP-3CTKKNNFDFLKRSQVYNKQD 183 
25 mh. v L TE+S + +L Q++YILAE+P ++ EP N P TKKNN+DFLK SQ+YN ++ 

^ Sbjct: 121 K--QATTEMSRFTKKLHQDNYILAELPKEYKEPKNLPC5GTTKKNNYDFLKSSQ1YNNKE 178 

Query: 184 NQFHKEPAKAQELNLTRFKDI 204 
+ +E+ AQELNL+RF+D-t- 
^ Sbjct: 179 MRQQREKTIAQELNLSRFEDL 199 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1560 

A DNA sequence (GBSxl652) was identified in S.agalactiae <SEQ ID 4819> which encodes the amino 
35 acid sequence <SEQ ID 4S20>. Analysis of this protein sequence reveals the following: 

Possible site: 2 9 

>» Seems to have no N-terminal signal sequence 

Final Results 

40 bacterial cytoplasm — Certainty=0. 4959 (Affirmative) < suco 

bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 
bacterial outside — Certainty=0. 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that mis protein and its epitopes, could be useful antigens for 



Example 1561 

A DNA sequence (GBSxl653) was identified in S.agalactiae <SEQ ID 4821> which encodes the amino 
acid sequence <SEQ ID 4822>. This protein is predicted to be SNF. Analysis of this protein sequence 
reveals the following: 
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Possible site: 28 

»> Seems to have no N-terminal signal sequence 
INTEGRAL Likelihood = -0.32 



743 - 759 ( 743 - 759) 



- Final Results 

bacterial membrane - 

bacterial outside - 

bacterial cytoplasm - 



• Certainty=0.112B (Affirmative! 

• Certainty=0. 0000 (Not Clear) < 
■ Certainty=0. 0000 (Not Clear) • 



10 The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAA67095 GB:X98455 SNF [Bacillus cereus] 
Identities = 259/678 (38%) , Positives = 406/678 (59%) , Gaps = 21/678 (3%) 

369 QNEILLQ^WFDYGINDLTVHNRQELEQLTFASHFKHEEKVFKLLEKYGFAPHFSTSHPAYS 428 

+N +L + F YGN + ++ + F K E+++ ++ + FA + ++ 

3 88 tajRLLAGLEFHYGNWINPLEEDGQPSVFNRDEKKEKEILDIMSESAFAKT-EGGYFMHN 446 

429 AQELYDFYTYMLPQFKKMGTV- -SLSAKLESYRLIERPQIDIEAKGSL- -LDISFDFSDL 484 

+ Y+F +++P K+ + + + KL++ PI+K + L FD + 
447 EEAEYNFLYHIVPTLKGLVDIYATTAIKLRIHKGDTAPLIRVRRKERIDWBSFRFDIKGI 506 

485 LENDVDQALVALFDNNPYFVNKSGQLVIFD-EETKKVSATLQ--GLRARRAKNGHIELDN 541 

E ++ L AL + Y+ +G L+ + +E .+++ ++ G+R + + 

507 PEAEIKGVLAALEEKRKYYRljANGSLLSLESKEFNEINQFVKESGIRKEFLHGEEVNVPL 566 

542 IAAFQLSELFANQDNVSFSQHFYQLIEDLRHPEKFK- -IPGLSVSASLRDYQLTGVRWLS 599 

I + + + +S + L+E +++P+K K +P ++ A +R+YQ+ G W+ 

567 IRSVKlvMNGLHEGNVLSLDESVQDLVESIQNPKKLKFTOPP-TIiHAVMREYQVYGFEWMK 625 

600 MLDHYGFAGILADDMGLGKTLQTISFLSTKLT- -RDSR- -VLILSPSSLIYNWQDEFHKF 655 

L +Y F GILADDMGLGKTLQ+I+++ + L R+ + +L++SPSSL+YNW E KF 
626 TIAYYRFGGILADDMGLGKTLQSIAYIDSVI^EIREKKLPILWSPSSLVYNWFSELKKF 685 

656 APDVDVAVAYGSKIRRDEIIAE--RHQVIITSYSSFRQDFETYSEGNYDYLILDEAQVMK 713 

AP + +A G++ R +1+ + V+IT2Y R+D +Y+ + L LDEAQ K 

686 APHIRAVIADGNQTERRKILKDVAEFDWITSYPLLRRDVRSYARP-FHTLFLDEAQAFK 744 

714 NAQTKIAHSLRSFEVKNCFALSGTPIFjNKLLEIWSIFQIIIjPGLLPGKKEFLKIjNPKQVA 773 

N T+ A ++++ + + F L+GTP+EN L E+WSIF ++ P LLPG+KEP L + +A 
745 NPTTQTARAVKTIQAEYRFGLTGTPVENSLEELWSIFHWFPELLPGRKEFGDLRREDIA 804 

774 RYIKPFVNRFJIKEEVLPELPDLIEMNYFNEMTDSQKVIYLAQLRQI-QESIQHSSDADLN 832 

+KPFV+RR KE+VL ELPD IE +E+ QK +Y A L 4+ +E+++H L 
805 NAVKPFVLRRLKEDVLQELPDKIEHLQSSELLPDQKRLYAAYLAKLREETLKHLDKDTLR 864 

833 RRKIEILSGITRLRQICDTPRLFMD-YDGESGKLESLRQLLTQIKENGHRALIFSQFRGM 891 

+ KI IL+G+TRLRQIC+ P LF+D YGSKLEL +L + + GR LIFSQF M 
865 KNKIRILAGLTRLRQICNHPALFVDDYKGSSAKLEQLLDILEECRSTGKRILIFSQFTKM 924 

892 LDIAEREIWAMGLTTYKITGSTPAl^RHEMTRA?T\ T AGSiaDAFLISLKAGGVGIiNLTGADT 951 

L I RE+ + + + G+TP+ ER E+ FN G D FLISLKAGG GLNLTGADT 
925 LSIIGRELNRQAIPYFYLDGNTPSQERVELCNRFNEGEGDLFLISLKAGGTGIjNLTGADT 984 

952 vVLIDLWWNPAVEMQAISRAHRLGQKENVEVYRLITRGTIEEKILEMQETKKHLVTTVLD 1011 

V+L DLWWNPAVE QA RA+R+GQK V+V +L+ GTIEEK+ E+QE+KKHL+ V++ 
985 VILYDLWWNPATCQQAADRAYRIIGQKSIWQVIKIjVAHGTIEEKMHELQESKKHLIAEVIE 1044 

1012 -GNETHASMSVDDIREIL 1028 

1045 PGEEKLSSITEEEIRDIL 1062 



A related DNA sequence was identified in S.pyogenes <SEQ ID 4823> which encodes the amino acid 
sequence <SEQ ID 4824>. Analysis of this protein sequence reveals the following: 
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Final Results 

bacterial cytoplasm Certainty=0 .3909 (Affirmative) < suco 

bacterial membrane --- Certainty=0 .0000 (Not Clear) < suco 

bacterial outside Certainty=0 .0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 674/1031 (65%) , Positives = 334/1031 (30%) , Gaps = 2/1031 (0%) 

MSRMIPGRIRNQGIELYEQGLVSLISQEGNLLKAKVGDCQIEYSLVTEETKCSCDFPARK 60 
M+R+I PGR+RN+GI +LYEQGLVS +L+ +V Q++Y E+ C CD F K 

MARLIPGRVRNEGIKLYEQGLVSFQDDNKGILQIEVETYQVQYGADDEDITCQCDTFHMK 61 

GYCQHLAALEHFLKNDPEGKAILSKVQVQQESQQETKKKTSFGSVFLDSLIINEDDTIKY 120 

YC+H+AA+E+FLKND +GK L ++ Q + ++ TKK TSFGS+FLDSL +NEDD++KY 
HYCKHIAAVEYFLKNDQKGKLFLKQLTNQTKIKETTKKMTSFGSLFLDSLAMNEDDSVKY 121 



+LSA G ++P+++D WW+LKI RLPDDRSYVIRDIK FL ++KE +YQIGK YFE hS 













































301 


!bjl'. 






361 


Sbjct- 


361 




421 


Sbjct- 


420 




481 


ibjl- 






541 








601 


Sbjct: 


600 




661 


Sbjct: 


660 




721 


Sbjct: 


720 


Query: 


781 


Sbjct: 


730 


Query: 


841 



+0FD +SQ LIEFLWRL S + K D E I EN RHL L 



++HL+ + LE E LY+FKV VHR+SIEL+I EK+++ LF N YL Y+DTFYHL 



LKQ KMV AIRSLPIE DLAKHIHFDLDD KLAA L DFK+IGLV4 AP+ SF+ 1 DF+V 



F+FD+ +++EI Q+-I-FDYGN V ++ LE L FASH K EEK+ + L +GF+P F 



FS ++END+D0A+ ALF NNPYFV+++GQLV+FD+ET+KVS 4LQ LRAR+ KNGH++LD 
FSTIIENDIDQAVTALFQNNPYFVSQTGQLWFDDETQJWSKSLQEIiRARQLKNGHLQLD 539 



I A Q+S+LF +V FS+ +L L+HPE F I L V A +RDYQ GV+WLSM 
GIRALQVSKLFEGMTSVHFSKELEELAYHLQHPETFSIJCPLPVI<AQMRDYQRNGVQWLSM 599 



L+HYGF GILADDMGLGKTLQT++FL++ L DS+VLILSPSSLIYNW DE KF P +D 



V V+YG K RD+II E HQ+ ZTSYSSFRQDFETY +YDYLILDEAQV+KNAQTKI + 



H LR+F NCFALSGTP IENK+L3 IWS I FQ J +LPGLLP KKEFLKL +QV+RYIKPFV 



GITRLRQICDTPRLFMDYDGESGKLESIiRQLLTQIKENGHRALIFSQFRGMLDIAEREMV 900 
GITRLRQICDTP LFMDY G+SGKL+SLR LLTQIKENGHRALIFSQFRGMLD+A++EM 
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Sbjct: 840 GITRLRQICDTPSLFMDYQGKSGKLDSLRILLTQIKENGHRALIFSQFRGMLDIAKQEMT 899 

Query: 901 AMGLTTYKITGSTPANERHEMlRAFl^GSKDJ^LISLKftGGVGLNLTGADTVVLIDLVPOT 960 

A+GLT+Y++TGSTPANER EMTRAFN GSKDftFLISLKAGGVG+NLTGADTV+LIDLWWN 
Sbjct: 900 ALGLTSYQMTGSTPANERQEMTRAFKNG8KDAFLISLKAGGVGINLTGADTVILIDLWWK 959 

Query: 961 PAVEMQAISRAHRLGQKEKTVEVYRLITRGTIEEKILEMQETKKHLVTTVLDGNETHASMS 1020 

PAVEMQAISRR+R+GQKENVEVYRLITRGTIEEKILE+QE+K++LVTTVLDGNE+ ASMS 
Sbjct: 960 PAVEMQAISRAYRIGQKEinfEVYRLITRGTIEEKILELQESKRNLVTTVLDGNESRASMS 1019 

Query: 1021 VDDIREILGVS 1031 

+++I+EILG++ 
Sbjct: 1020 IEEIKEILGLN 1030 

SEQ ID 4822 (GBS369) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 64 (lane 5; MW 120kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 69 (lane 6; MW 142kDa). 

The GBS369-GST fusion product was purified (Figure 215, lane 7) and used to immunise mice. The 
resulting antiserum was used for FACS (Figure 303), which confmned that the protein is immunoaccessible 
on GBS bacteria. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1562 

A DNA sequence (GBSxl654) was identified in S.agalactiae <SEQ ID 4825> which encodes the amino 
acid sequence <SEQ ID 4826>. Analysis of this protein sequence reveals the following: 
Possible site: 41 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .3391 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
There is also homology to SEQ ID 1034: 

Identities = 34/38 (89%) , Positives = 37/38 (95%) 

Query: 1 MEKEAKQIIDLKRNLFKIDVRAQKDEEKVFMRTACQFS 38 

+EKEAKQ+IDLKRNLFKIDVRAQKDEEKVFMRTAC+ S 
Sbjct: 1 LEKEAKQMIDLKRNLFKIDVRAQKDEEKVFMRTACRQS 38 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1563 

A DNA sequence (GBSxl656) was identified in S.agalactiae <SEQ ID 4827> which encodes the amino 
acid sequence <SEQ ID 4828>. This protein is predicted to be phosphoglycerate dehydrogenase (era2). 
Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 
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Final Results 

bacterial cytoplasm Certainty=0. 3709 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAA88823 GB:AB016077 phosphoglycerate dehydrogenase 
[Streptococcus mutans] 
Identities = 377/436 (8S%) , Positives = 414/436 (94%) 

Query: 1 IWLPTVAIVGRPNVGKSTLFNRIAGERISIVEDVEGVTRDRIYTTGEWLNRKFSLIDTGG 60 

M LPTVAIVGRPNVGKS LFNRIAGERISIVEDVEGVTRDRIYT EWLNR+FS+IDTGG 
Sbjct: 1 MALPTVAIVGRPNVGKSALFNRIAGERISIVEDVEGVTRDRIYTKAEWLNRQFSIIDTGG 60 

Query: 61 IDDVnAPFMEQIKHQADIAMTEADVIVFWSGKEGVTDADEYVSRILYKTNKPVILAVNK 120 

IDDVDAPFMEQIKHQADIAMTEADVIVFWS KEG+TDADEYV++ILY+T+KPVILAVNK 
Sbjct: 61 IDDVDAPFMEQI KHQAD IAMTEADVI VFWSAICEG I TDADEYVAKI LYRTHKPV1LAVNK 120 

Query: 121 VDNPEMRNDIYDFYSLGLGDPYPLSSVHGIGTGDILDAIVENLPVEEENENPDIIRFSLI 180 

VDNPEMR+ IYDFY+LGLGDPYP+SS HGIGTGD+LDAIV+NLP E + E+ DII+FSLI 
Sbjct: 121 VDNPEMRSAIYDFYALGLGDPYPVSSAHG1GTGDVLDAIVDNLPAEAQEESSDIIKFSLI 180 

Query: 181 GRPNVGKSSLINAILGEDRVIASPVAGTTRDAIDTNFVDSQGQEYTMIDTAGMRKSGKVY 240 

GRPNVGKSSLINAILGEDRVIASPVAGTTRDAIDT F D +GQE+TMIDTAGMRKSGKVY 
Sbjct: 181 GRPNVGKSSLINAILGEDRVIASPVAGTTRDAIDTTFTDEEGQEFTMIDTAGMRKSGKVY 240 

Query: 241 EOTEI<YSVMRSMRAIDRSDVVLWINAEEGIREYDKRIAGFAHETGKGIIIVVNKWDTIE 300 

ENTEKYSVMR+MRAIDRSD+VLMV+NAEEGIREYDKRIAGFAHE GKGI++WNKWD 1 + 
Sbjct: 241 EMIBKYSVMRAMFAIDRSDIVLMVU1AEEGIREYDKRIAGFAHEAGKGIVVVVNKWDAIK 300 

Query: 301 KDNHTVSQWFADIRDNFQFLSYAPIIFVSAETKQRLHKLPDMIKRISESQNKRIPSAVLN 360 

KDN TV+QWE DIRDNFQ++ YAPI+FVSA TKQRLHKLPD+IK++S+SQN RIPS+VLN 
Sbjct: 301 KDNRTOAQWETDIRDNFQYIPYAPIVPVSAVTKQRLHKLPDVIKQVSQSQNTRIPSSVLN 360 

Query: 361 DVIMDAIAINPTPTDKGKRLKIFYATQVAVKPPTFWFVNEEELMHFSYLRFLENQIREA 420 

DV+MDA+AINPTPTDKGKRLKI FYATQV+VKPPTFV+FVNEEELMHFSYbRFLENQIR+A 
Sbjct: 361 DVWDAVAINPTPTDKGKRLKIFYATQVSXfKPPTFVIFVNEEELMHFSYLRFLENQIRQA 420 

Query: 421 FVFEGTP INL I ARKRK 436 

FVFEGTPI LIARKRK 
Sbjct: 421 FVFEGTPIRLIARKRK 436 

A related DNA sequence was identified in S.pyogenes <SEQ ID 482 9> which encodes the amino acid 
sequence <SEQ ID 4830>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 

Final Results 

bacterial cytoplasm --- Certainty=0. 3463 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 403/436 (92%) , Positives = 422/436 (96%) 

Query: 1 MVLPTVAIVGRPNVGKSTljFNRI AGERI S IVEDVEGOTRDRIYTTGEWLJslRKFSLIDTGG 60 

MVLETVAIVGRPNVGKSTLFNRIAGERI S I VEDVEGVTRDRIY TGEWLMR+FSLIDTGG 
Sbjct: 1 MVLPTVAIVGRPNVGKSTLFNRIAGERISIWDVEGVTRDRIYATGEWIoNRQFSLIDTGG 60 

Query: 61 IDDVDAPFMEQIKHQADIAMTEADVIVFWSGKEGVTDADEYVSRILYKTNKPVILAVNK 120 

IDDVDAPFMEQIKHQA I AM EADVIVFWSGKEGVTDADEYVS+ILY+TN PVILAVNK 
Sbjct: 61 IDDVDAPFMEQIKHQAQIAMEEADVIVFWSGKEGVTDADEWSKILYRTNTPVILAVNK 120 



Query: 121 VDNPEMRNDIYDFYSLGLGDPYPLSS\TiGIGTGDILDAI\ r ENLPVEEENENPDIIRFSLI 180 
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VDNPEMEMDIYDFYSLGLGDPYP4SSVHGIGTGD+LDAIVENLPVEE EN DIIRFSLI 
Sbjct: 121 VDNPEMRNDIYDFYSLGLGDPYPVSSVHGIGTGDVLDAIVENLPVEEAEENDDIIRFSL1 180 

Query: 181 GRPlWGKSSLimiLGEDRVIASPVAGTTRDRIDMETOSQGQEYTMIDTAGWKSG^ 240 

GRPNVGKSSLINAILGEDRVIASPVAGTTRDAIDT+F D+ GQE+TMIDTAGMRKSGK+Y • 
Sbjct: 181 GRPNVGKSSLINAILGEDRVIASPVAGTTRDAIDTHFTDADGQEFTMIDTAGMRKSGKIY 240 

Query: 241 ENTEKYSVMRSMRAIDRSDVVLMVINAEEGIREYDKRIAGFAHETGKGIIIVVNKWDTIE 300 . 

ENTEKYSVMR+MRAIDRSDWLMVINAEEGIREYDKRIAGFAHE GKG+IIWNKWDTI+ 
Sbjct: 241 EOTEKYSWRAMRAIDRSDVVLbWINAEEGIREYDKEIAGFAHEAGKGMIIVVMKWDTID 300 

Query: 301 KDNHTVSQWEADIRDNFQFLSYAPI I FVSAETKQRLKKLPDMIKRISESQNKRI PSAVLN 360 

KDNHTV++WEADIRD FQFL+YAPI I FVSA TKQRL+KLPD+IKRISESQNKRIPgAVLN 
Sbjct: 301 KDNHTVAKWEADIRDQFQFLTYAPIIFVSALTKQRLNKLPDLIKRISESQNKRIPSAVIjH 360 

Query: 361 DVIMDAIAINPTPTDKGKRLKIFYATQVAVKPPTFWFUNEEELMHFSYLRFLENQIREA 420 

DVIMDAIAINPTPTDKGKRLKIFYATQV+VKPPTFWFVMEEELMHFSYLRFLENQIR A 
Sbjct: 3 61 DVIMDAIAINPTPTDKGKRLKIFYATQVSVKPPTFWFWEEELMHFSYLRFLENQIRAA 4.2.0 . - 

Query: 421 FVFEGTPIHLIARKRK 436 

F FEGTPI+LIARKRK 
Sbjct: 421 FTFEGTPIHLIARKRK 436 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1564 

A DNA sequence (GBSxl657) was identified in S.agalacttae <SEQ ID 483 1> which encodes the amino 
acid sequence <SEQ ID 4832>. Analysis of this protein sequence reveals the following: 
Possible site: 51 

»> Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm — Certainty=0 .2734 (Affirmative) < suco 

bacterial membrane --- Certainty=0. 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC00359 GB:AF008220 Dnal [Bacillus subtilis] 
Identities = 105/313 (33%) , Positives = 191/313 (60%) , Gaps = 17/313 (5%) 

Query: 1 MKSVGQALENQGRVP- -RNTNDELIQMILADAQVAEFIKTHQ- -LSQREINISMSKFNQF 56 

M+ +G+4-L+ P + +++ + ++ D V F+K ++ + Q+ I S++K ++ 

Sbjct: 1 MEPIGRSI^GVTGRPDFQKRLEQMKEKVMKDQDVQAFLKENEEVIDQKMIEKSLNKLYEY 60 

Query: 57 LIERQK FKNKDSQYIAKGYEPILVMNEGYADVSYLE--TRELIEAQKKQAISDRI 109 

IE+ k ++++ + +GY P LV+M D+ Y E + ++ QKKQ + 

Sbjct: 61 -IEQSKNCSYCSEDENC!NNLLEGYHPKLVVNGRSIDIEYYECPVKRKLDQQKKQ--QSLM 117 

Query: 110 I^VI^PKSYRNIRMTDFDINITOSPJIPCAMSQLLDFvETYPSYlffl-KGLYLYGDMGVGKSYIj 158 

+ + + DI++ SR+ + DF+++Y KGLYLYG GVGK+++ 

Sbjct: 118 KSMYIQQDLLGATFQQVDISDPSRLAMFQHVTDFLKSYNETGKGKGLYLYGKFGVGKTFM 177 

Query: 169 MAAMARELSERKGVSTTIjLHFPSFAIDVKKB^ISSGTvKDEIDAVKSVPILILDDIGAEQA 228 

+AA+A EL+E++ S+ +++ P F ++KN++ T++++++ VK-H P+L+LDDIGAE 
Sbjct: 178 IAAIANEIAEKE-YSSMIVYVPEFWELKNSLQDQTLEEKIiNMvlCTTPvlMLDDIGAESM 236 

Query: 229 TSWVRDEILQVILQHRMLEELPTFFTSNYSFIIDLERKI'JA-NIKGSDETWQAKRVMERVRY 287 

TSWVRDE++ 4-LQHRM ++LPTFF+SN+S + + +G E +A R+MER+ Y 

Sbjct: 237 TSWVRDEVIGTVLQHRMSQQLPTFFSSKFSPDSLKHHFTYSQRGEKEEVKAARLMERILY 296 
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Sbjct: 297 LAAPIRLDGENRR 309 

A related DNA sequence was identified in S.pyogenes <SEQ ID 4833> which encodes the amino acid 
sequence <SEQ ID 4834>. Analysis of this protein sequence reveals the following: 

5 Possible site: 19 

:»> Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1944 (Affirmative) < auco 

10 bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 228/300 (76%) , Positives = 264/300 (88%) 

15 



Query: 


1 


MKSVGQALENQGRVPRtmSIDELlQMILADAQVAEFIKTHQLSQREINlSMSKFNQFLIER 


60 






M+ +G+ + G+ R +D+L1Q 1LAD +VA FI H LSQ +IN+S+SKFNQFL+ER 




Sbjct: 


1 


MEKIGETMAKLGQNTRVNSDQLIQTILADPEVASFISQHHLSQEQ1NLSLSKFNQFLVER 


60 




61 


QKFKNKDSQYIAKGYEPILVMNEGYADVSyLETRELIEAQKKQAISDRINLVNLPKSYRN 








QK++ KD YIAKGY+PIL MNEGYADVSYLET4EL+EAQK+ AIS+RI LV+LPKSYR4 




Sbjct: 


61 


QKYQLKDPSYIAKGYQPILAMNEGYADVSYLETKELVEAQKQAAISERIQLVSLPKSYRH 






121 


IRMTDFDINNESRMKAMSQLLDFVETYPSYNHKGLYLYGDMGVGKSYLMAAMARELSERK 


180 






I ++D D+NN SRM+A S +LDFVE YPS KGLYLYGDMG+GKSYL+AAMA ELSE+K 




Sbjct: 




IHLSDIDVNNASRMEAFSAILDFVEQYPSAEQKGLYLYGDMGIGKSYLLAAMAHELSEKK 




Query: 


181 


GVSTTLLHFPSFAIDVKNAISSGTVKDEIDAVKSVPILIIiDDIGAEOATSWVRDEILQVI 


240 






GVSTTLLHFPSFAIDVKNAIS+G+VK+EIDAVK+VP+LILDDIGAEQATSWVRDE+LQVI 




Sbjct: 


181 


GVSTTLLHFPSFAinVKNAISNGSVKEEIDAVKNVPVLrLDDIGAEQATSVJVRDEVLQVI 


240 




241 


LQHRMLEELPTFFTSNYSFNDLERKWANIKGSDETWQAKRVMERTOYIAIEFHLEGPNRR 


300 






LQ+RMLEELPTFFTSNYSF DLERKWA IKGSDETWQAKRVMERVRYLA EFHLEG NRR 




Sbjct: 


241 


LQYRMLEELPTFFTSNySFADLERKWATIKGSDETVJQAKRVMERVRYLAREFHLEGANRR 


300 



35 

Based on this analysis, it was predicted that these proteins and their epitopes could he useful antigens for 
vaccines or diagnostics. 

Example 1565 

A DNA sequence (GBSxl658) was identified in S.agalactiae <SEQ ID 4835> which encodes the amino 
40 acid sequence <SEQ ID 4836>. Analysis of this protein sequence reveals the following: 

Possible site: 37 

»> Seems to have no N-terminal signal sequence 

Final Results 

45 bacterial cytoplasm Certainty=0. 2660 (Affirmative) < suco 

bacterial membrane — Certainty=0. 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

50 A related DNA sequence was identified in S.pyogenes <SEQ ID 483 7> which encodes the amino acid 
sequence <SEQ ID 483 8>. Analysis of this protein sequence reveals the following: 

Possible site: 35 

>» Seems to have no N-terminal signal sequence 

55 Final Results 

bacterial cytoplasm Certainty=0. 213 5 (Affirmative) < suco 

bacterial membrane --- Certainty=0. 0000 (Not Clear) < suco 
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bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 217/391 (55%), Positives = 309/391 (78%) 

5 



Query: 


1 


mmspideftyikqnkivydsnsliqlyfp:mgsd.amalydyfvhffddgirrhkfsevln 


60 






MM PID FTY+K+NK+ DS +LIQLYFPI+GSDA+++Y YF+HFFDDG++RHKFS++LN 




Sbjct: 


1 


MMKPIDTFTYLKENKVTLDSVTLIQLYFPIIGSDAVSIYQYFIHFFDDGLQRHKFSDILN 


60 


Query: 


61. 


HLQYGMPRFQDALVMLTALDLLTVYQATGTYLVICLNQAMSNELFLSNPIYRRLLEKRIGE 


120 






HLQ+GM RF+DAL +LTA++L++VYQ + TYL+ L+Q +S +LF +P Y RLLE++IGE 






61 


HLQFGMKRFEQALAILTAMELVSVYQLSDTYLITLHQPLSRDLFFQHPAYSRLLEQKIGE 


120 



Query: 121 VAVAELDMKIPKNARDISKKFTDVFSDLGQPKQSVNRSKNVFDLESFKRLMMRDGLRFNN 180 

VAV+EL + +P AR+ISK+F+D+F G + + FDL SF++LM+RDGL4-F + 

Sbjct: 121 VAVSELQVTVPSQARNISKRFSDIFGVQGDLTNVPQKPQKNFDLSSFQQL^^VRDGLQFED 180 

Query: 181 EKDDVX 1 GIYSVSELYH] J NWYDTYQIAKQCAING^1IAPQR^1KVQQNEGQHIKDNQSFTNNE 240 

+ D++ +YS++E Y + W+DTYQ+AK TA+NG I P+R+ ++N+ ++F+ E 

Sbjct: 181 NQIODIISLYSIAEQYDMTWFDTYQ1AKATAVNGKIRPERLLAKKNQSMTKPSKENFSQAE 240 

Query: 241 KVILRESKNDSALVFLEKIKESRKAOTTSGEKTLLEDLAK^FLDEVINVm^YTLNKTK 300 

++ILRE+K DSALVFLEKIK++R+A T E+ LL+ LAKMNFLD+VINVMVLYT NKTK 
Sbjct: 241 Q 1 1 LREAKQDSALVFLEKI KKARRAT ITKDERILIiQTLAKMNFLDDVINVMVLYTFNKTK 300 

Query: 301 SAmiNKAYIMKVAIIDFAFQNVMTAEDAVLKIRDFSDQKVRTKTETKKKQSNVPEWSNPDY 360 

SANL K+Y++K+ANDFA+Q V TAE+A++ +R F+D++ R +++ K QSNVP+WSNPDY 
Sbjct: 301 £ 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 



Example 1566 

A DNA sequence (GBSxl659) was identified in S.agalactiae <SEQ ID 4839> which encodes the amino 
acid sequence <SEQ ID 4840>. Analysis of this protein sequence reveals the following: 

Possible site: 19 
40 »> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm --- Certainty=0 .4485 (Affirmative) < suco 

bacterial membrane Cer~ainty=0 . 0000 (Not Clear) < suco 

45 bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAB06865 GB:AP001517 unknown conserved protein [Bacillus halodurans] 
Identities = 80/150 (53%) , Positives = 115/150 (76%) 

Query: 1 MRCPKCGYNKSSWDSRQftEEGTTIRRRRECEKCGNRFTTFERLEELPLLVIKKDGTREQ 60 

MRCP C +N + V+DSR A EG +IRRRRECE C +RFTTFE +EE+PL+V+KKDGTR++ 
Sbjct: 1 ^CPACHHNGTRVLDSRPAHEGRSIRRRRECESCNKRFTTFEMIEEVPIjIVVKKDGTRQE 60 

Query: 61 FSRDKIIJSIGIIQSAQKRPVSSEDIENCILRIERKIRSEYEDEVSSITIGNLVMDELAELD 120 

FS DKIL G+I++ +KRPV E +E + +ER++R + ++EV S IG LVM+ IA +D 
Sbjct: 61 FSSDKILRGLIRACEKRPVPLETLEGIVNEVBRELRGQGKNEVDSKEIGELvTffiRLANVD 120 



Query: 121 EITYVRFASVYKSFKDVDEIEELLQQITKR 150 
60 ++ YVRFASVY+ FKD++ + L+++ +R 

Sbjct: 121 DVAYVRFASVYRQFKDINVFIQELKELMER 150 
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A related DNA sequence was identified in S. pyogenes <SEQ ID 484 1> which encodes the amino acid 
sequence <SEQ ID 4842>. Analysis of this protein sequence reveals the following: 

Possible site: 19 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 43 6 5 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 131/155 (84%) , Positives = 143/155 (91%) 

Query: 1 MRCPKCGYNKSSWDSRQAEEGTTIRRRRECEKCGNRFTTFERLEELPLLVIKKDGTREQ 60 
+RCPKC Y+KSSWDSRQAE+G TIRRRRECE+C RFTTFER+EELPLLVI KKDGTREQ 

Sbjct: 1 



Query: 61 

FSRDKILNG++QSAQKRPVSS DIEN I RIE+++R+ YE+EVSS IGNLVMDELAELD 
Sbjct: 61 FSRDKILNGWQSAQKRPVSSTDIENVISRIEQEURTTYENEVSSTAIGNLVMDELAELD 120 

Query: 121 EITYVRFASVYKSFKDVDEIEELLQQITKRWSKK 155 

E1TYVRFASVYKSFKDVDEIEELLQQIT RVR KK 
Sbjct: 121 EI T YVRFAS VYKS F KDVDE I EELLQQ ITNRVRGKK 155 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1567 

A DNA sequence (GBSxl660) was identified in S.agalactiae <SEQ ID 4843> which encodes the amino 
acid sequence <SEQ ID 4844>. This protein is predicted to be CsrS (mtrB). Analysis of this protein 
sequence reveals the following: 



Possible site: 35 
>>> Seems to have r 



INTEGRAL Likelihood =-11.30 Transmembrane 22 - 
INTEGRAL Likelihood = -9.66 Transmembrane 189 - 



Final Results 

bacterial membrane Certainty=0 . 5522 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related DNA sequence was identified in S. pyogenes <SEQ ID 2109> which encodes the amino acid 
sequence <SEQ ID 21 10>. Analysis of this protein sequence reveals the following: 

Possible site: 35 

>>> Seems to have a cleavable N-term signal seq. 

INTEGRAL Likelihood = -6.32 Transmembrane 196 - 212 ( 189 - 214) 

Final Results 

bacterial membrane Certainty=0 .3527 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below. 

55 Identities = 248/501 (49%) , Positives = 363/501 (71%) , Gaps = 4/501 (0%) 
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Query: 1 MKNKECDQFIGVKQPLSKKLSQLVFILFFSLFTVFSVLVYTSATRYVLHREKINVGRSLEK 60 

M+N+K + K L K+LS + F+LFF +F+ F+++ Y+S ++L +EK +V +++ 
Sbjct: 1 MENQKQKQKKYKNSLPKRLSNIFFVLFFCIFSAFTLIAYSSTNYFLLKKEKQSVFQAVNI 60 

Query: 61 TRVRLSQANSSLTSDDILEILYNQYFADDIYPHKRQNGIVRTGESIDSILYVNQEMTLYD 12 0 

RVRLS+ +S+ T +++ E+LY ++ + ++R+ I + h NQ++ +Y+ 

Sbjct: 61 TOTOLSEVDSNFTLFJSI^VLYKimKTHLRIDDRKGSRVIRSERDITNTLDAMQDIYVYN 120 

Query: 121 VNRICPVFST-LRTGMPTIGKSMGKVIISKVADM-EGFVGTKAIYEQKTGQLLGYVQIFYN 178 

++++ +F+T P + +G+V + D GF T+ +YS +TG+ +GYVQ+F++ 

Sbjct: 121 IDKQMIFTTDNEESSPGLHGPIGRVYHDHIEDQYRGFSMTQKVYSNRTGKFVGYVQVFHD 180 

Query: 179 LGRYYS^KQNIIVFLI^*ffiVLGTVIALVVINSATKRIVRPVKNLHDLMHQISENPSNLEI 238 

LG YY +R +4- +L+++E+ GT LA ++I T+R ++P+ NLH++M ISENP+NL + 
Sbjct: 181 LGNYWIRARLLFWLLWELFGTSLAYLIILITTRRFLKPLHNLHEVMRNISENPNNLNL 240 

Query: 239 RSKVRSEDE IGELSRI FDGMLDQLEDYTRRQSQFI SDVSHELRTPVAVVKGHIGLLQRWG 298 

RS + S DEI ELS IFD MLD+LE +T+ QS+FISDVSHELRTPVA++KGHIGLLQRWG 
Sbjct: 241 RSDISSGDEIEELSVIFDNMLDKLETHTICLQSRFISDVSHELRTPVA1IKGHIGLLQRWG 300 

Query: 299 KDDPEILEESLAAAYHEADRMSLMINDMLNMIRVQGSLELHQDEVTDLSSSISWIENFR 358 

KDD +ILEESL A HEADRM++MINDML+MIRVQGS E HQ+++T L SI V+ NFR 
Sbjct: 301 KDDSDILEESLTATAHEADRMAIMINDMLDMIRVQGSFEGHQNDMTVLEDSIETWGNFR 360 

Query: 359 ILREDFQFIFENNISDIVWGKIYKIHFEQALMILIDNAIKYSPSYKEVSWLSVDNDFAT 418 

+LREDF F +++ + +IYK HFEQALMILIDNA+KYS K++++ LSV 

Sbjct: 361 VLREDFIFTWQSENPKTI -ARIYKNHFEQALMILIDNAVKYSRKEKKIAINLSVTGKQEA 419 

Query: 419 W-VKDKGEGISDEDIEFIFDRFYRTDKSRNRESTQAGLGIGLSVFKQIMDAYHLKVDIK 477 

+V V+DKGEGIS EDIE I F+RF YRTDKSRNR STQAGLGIGLS+ KQI+D YHL++ ++ 
Sbjct: 420 IVRVQDKGEGISKEDIEHIFERFYRTDKSRNRTSTQAGLGIGLSILKQIVDGYHLQMKVE 479 



Sbjct: 480 SELNEGSVFILHIPLAQSKES 500 

A related GBS gene <SEQ ID 8845> and protein <SEQ ID 8846> were also identified. Analysis of this 
protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 5 
. SRCFLG: 0 
McG: Length of UR: 5 

Peak Value of UR: 0.74 
Net Charge of CR: 2 
McG: Discrim Score: -10.19 
GvH: Signal Score (-7.5): -3.66 

Possible site: 35 
>>> Seems to have no N- terminal signal sequence 
Amino Acid Composition: calculated from 1 
ALOM program count: 2 value: -11.30 threshold: 0.0 

INTEGRAL Likelihood =-11.30 Transmembrane 22 - 38 ( 18 - 43) 
INTEGRAL Likelihood = -9.66 Transmembrane 189 - 205 ( 187 - 212) 
PERIPHERAL Likelihood = 2.86 405 
modified ALOM score: 2.76 
icml HYPID : 7 CFP: 0.552 

*** Reasoning Step: 3 



- Final Results 

bacterial membrane --- Certainty=0 . 5522 (Af f i: 

bacterial outside Certainty=0 . 0000 (Not Clear) • 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) • 
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SEQ ID 8846 (GBS321) was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 173 (lane 6; MW 84kDa). It was also expressed in E.coli as a His-fusion product. 
SDS-PAGE analysis of total cell extract is shown in Figure 80 (lane 2; MW 58.7kDa). 

GBS321-GST was purified as shown in Figure 220, lane 3. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1568 

A DNA sequence (GBSxl661) was identified in S.agalactiae <SEQ ID 4845> which encodes the amino 
acid sequence <SEQ ID 4846>. This protein is predicted to be CsrR (trcR). Analysis of this protein 
sequence reveals the following: 

Possible site: 61 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .2649 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 .0000 (Not Clear) < suco 

A related DNA sequence was identified in S. pyogenes <SEQ ID 325 9> which encodes the amino acid 
sequence <SEQ ID 3260>. Analysis of this protein sequence reveals the following: 

N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .3226 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 193/229 (84%) , Positives = 211/229 (91%) , Gaps = 1/229 (0%) 

Query: 1 MGKKILIIEDEKNLARFVSLELLHEGYDVVVETNGREGLDTALEKDFDLILLDLMLPEMD 60 

M KKILIIEDEKNLARFVSLEL HEGY+V+VE NGREGL+TALEK+FDLILLDLMLPEMD 
Sbjct: 1 MTKKILIIEDEKNLARFVSLELQHEGYEVIVEVNGREGLETALEKEFDLILLDLMLPEMD 60 

Query: 61 GFEITRRLQAEKTTYl^TARDSVMDIVAGLDRGADDYIVKPFAIEELLARVRAIFRRQ 120 

GFE+TRRLQ EKTTYIMMMTARDS+MD+VAGLDRGADDYIVKPFAIEELLAR+RAI FRRQ 
Sbjct: 61 GFEVTRRLQTEKTTYIMMMTARDSIMDWAGLDRGADDY1VKPFAIEELLARIRAIFRRQ 120 



Query: 121 E 

+IE++ K+ G +RDL IoN NRS RGD+EISLTKRE+DLLN+LMTNMNRVMTREEL 
Sbjct: 121 DIESE-KIOTPSQGIYPJDLvIaNPQNRS^MRGDDEISLTKP.EYDLLNIIjMTNMNRVMTREEL 179 

Query: 181 LEHWKYDVAAE1WVDVYIRYLRGKIDIPGRESYIQTVRGMGYVIREK 229 

L +VWKYD A ETNWDWIRYLRGKIDIPG+ESYIQTVRGMGYVIREK 
Sbjct: 180 LSNVWKYDEAVETNWDVYIRYLRGKID I PGKESYI QTVRGMGYVTREK 228 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
•vaccines or diagnostics. 

Example 1569 

A DNA sequence (GBSxl662) was identified in S.agalactiae <SEQ ID 4847> which encodes the amino 
acid sequence <SEQ ID 4848>. Analysis of this protein sequence reveals the following: 
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3 N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3864 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

: hypothetical protein [Streptococcus gordonii] 
.tives = 133/174 (76%) , Gaps = 3/174 (1%) 

Query: 3 LTEIKKSPEGLYFDKKIDIKESLMERHSEIMDISDIQVSGHWYEDGLYLLDYNMAYDIT 62 
+ EI+K+P+GL F+KK+D+ E L ER++EI+D+ DI SG YEDGLY IiDY ++Y IT 
15 Sbjct: 4 IQEIRKNPDGLAFEKKLDLAEEIjKERNAEILDVQDIVASGRAQYEDGLYFLDYELSYTIT 63 

Query: 63 LPSSRSMKPWLSEKQTINEVFIEAENVSTKECELVDQELVLILEEDDINLEESVIDNILL 122 

L SSRSM+PV E +NE+F+E V++ +E++DQ+LVL +E +IN+ ESV DNILL 
Sbjct: 64 LASSRSMEPVERKESYLVNEIFMEDGQVAS-QEMIDQDL^/LPIENGEINVAESVADNILL 122 

20 

Query: 123 NIPLRVL-AADEVGVEADLSGKNWSLMTEKQYEEKQAKEKEKSNPFAALEGMFD 175 

NIPL+VL AA+E G + +G++W +MTE Y++ QA++KE+++PFA L+G+FD 
Sbjct: 123 NI PLKVLTAAEEAGSDLP - TGRDWQVMTEDDYQKYQAEKKEENSPFAGLQGLFD 175 

25 A related DNA sequence was identified in S.pyogenes <SEQ ID 4849> which encodes the amino acid 
sequence <SEQ ID 4850>. Analysis of this protein sequence reveals the following: 
Possible site: 42 

>>> Seems to have no N-terminal signal sequence 

30 Final Results 

bacterial cytoplasm Certainty=0 .3032 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside — Certainty=0. 0000 (Not Clear) < suco 

35 An alignment of the GAS and GBS proteins is shown below. 

Identities = 8S/175 (49%) , Positives = 135/175 (77%) 

Query: 1 MLLTEIKKSPEGLYFDKKIDIKESLMERHSEIMDISDIQVSGHWYEDGLYLLDYNMAYD 60 
+ ++EI+K P+GL FD+ D+K L+ER +I+DI ++ G+V Y+ GLYLLDY ++Y+ 
40 Sbjct: 3 LAISEIRKHPDGLSFDRLCDVKSMLLERDQQIIDIKAVKAVGNVRYDKGLYLLDYQLSYE 62 

Query: 61 ITLPSSRSMKPVVLSEKQTINEVFIEAENVSTKKELVDQELVLILEEDDINLEESVIDNI 120 

+ LPSSRSM PV LSE Q I E+FIEA +++ KKELV+ LVL+L++D INLEES++DNI 
Sbjct: 63 VILPSSRSrWPVCLSEVQHIQELFIEATDLADKKELVEDNLVLVLDKDAINLEESIVDNI 122 

45 

Query: 121 LmiPLRvIAADEVGVEADLSGKNWSLMTEKQYEEKQAKEKEKSNPFAALEGMFD 175 

hh IP++VL +E + +G+NW+++TE+ Y+ + ++++++NPFA+L+G+FD 
Sbjct: 123 LLAIPVQ\7LTEEEKKSKELPAGQNWAVLTEEDYQCLKEEKQKENNPFASLQGLFD 177 

50 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1570 

A DNA sequence (GBSxl663) was identified in S.agalactiae <SEQ ID 485 1> which encodes the amino 
acid sequence <SEQ ID 4852>. This protein is predicted to be heat shock protein (htpX). Analysis of this 
55 protein sequence reveals the following: 

Possible site: 25 

?>> Seems to have a cleavable N-term signal seq. 

INTEGRAL Likelihood =-11.30 Transmembrane 195 - 211 ( 190 - 221) 
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Final Results 

bacterial membrane Certainty=0 . 5522 (Affirmative) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAB70525 GB:AF017421 putative heat shock protein HtpX 
[Streptococcus gordonii] 
Identities = 220/297 (74%), Positives = 261/297 (97%), Gaps = 1/297 (0%) 

MLYQQIASNKRKTWLLIVFFCLLAAIGAAVGyL\fLGSYQFGLVIiALIIGV'IYAVSMIFQ 60 
ML++QIA+NKR+T LL+ FF LLA IGAA GYL + S G+++A IIG+IYA++MIFQ 
MLFEQIAANKRRTWFLLVAFFALLALIGAAAGYLWMNSPLGGVIIAFIIGLIYAITMIFQ 60 



Query: 




Sbjct: 


1 




61 


Sbjct: 


61 




121 


Sbjct: 


121 




181 


Sbjct: 


240 


Sbjct: 


241 



AVAATTGLL +MNREELEGVIGHEVSHIRNYDIRISTIAVALASJ 



GMI AL+KLD SEPM VDDASAALYI+DP KK GL+ LFYTHPPI++R+ERLR M 
GMIRALQKLDNSEPMHRHVDDASAALYISDPKKKGGLQKLFYTHPPISERVERLRKM 297 

A related DNA sequence was identified in S.pyogenes <SEQ ID 485 3> which encodes the amino acid 
sequence <SEQ ID 4854>. Analysis of this protein sequence reveals the following: 

Possible site: 31 
»> Seems to have a cleavable N-term signal seq. 

INTEGRAL Likelihood = -9.77 Transmembrane 197 - 213 ( 192 - 223) 
INTEGRAL Likelihood = -8.33 Transmembrane 43 - 59 ( 33 - 61) 
INTEGRAL Likelihood = -3.82 Transmembrane 153 - 159 ( 153 - 174) 

Final Results 

bacterial membrane Certainty=0. 4906 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:AAB70525 GB:AF017421 putative heat shock protein HtpX [Streptococcus gordonii] 
Identities = 208/298 (69%) , Positives = 257/298 (85%) , Gaps = 1/298 (0%) 

Query: 1 MLYQQISQNKQRTVVLLVGFFALLALIGASAGYLLLDNYAMGLVLALVIGVIYATSMIFQ 60 

ML++QI+ NK+RT LLV FFALLALIGA+AGYL +++ G+++A +IG+IYA +MIFQ 
Sbjct: 1 MLFEQIAANKRRTWFLLVAFFALLALIGAAAGYLWMNSPLGGVIIAFIIGLIYAITMIFQ 60 

Query: 61 STSLVMSMNNAREVTEKFAPGFFHI VEDMAMVAQI PMPRVFI IEDPSLNAFATGSSPQNA 120 

ST +VMSMN AR+V+E+EAP +HI V+DMAMVAQ I PMPRV+ 1 +ED S NAFATGS+P+NA 
Sbjct: 61 STEWMSMNGARQVSEQEAPELYHIVQDMAMVAQIPMPRVYIVEDDSPNAFATGSNPENA 120 

Query: 121 AVAATTGLLEVMNREELEGVIGHEISHIRNYDIRISTIAVALASAVTVISSIGGRMLWYG 180 

AVAATTGLL +MNREELEGVTGHE+SHIRNYDIRISTIAVALASA+T+ISS+ GRM+WYG 
Sbjct: 121 AVAATTGLLRLMNREELEGVIGHEWSHIRNYDIRISTXAVAIASAITMISSVAGRMMWYG 180 
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Sbjct: 131 GG-RW^RDDDSGLGLLMLVFSLIAIirAPL^TLVQLaiSRQREFLADASSVELTRNP 239 

Query: 241 QGMIKALEKLQLSQPMKHPVDDASRMYIKEPRKKRSPSSLPSTHPP1EERIERLICNM 298 

QGMI+AL+KL S+PM VDDASAALYI++P+KK LF THPPI ER+ERL+ M 

Sbjct: 240 QGMIRALQKLDHSEPMHRHVDDASRaiiYISDPKKKGGLQKLFYTHPPISERVERLRKM 297 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 233/298 (78%), Positives - 262/298 (87%), Gaps = 2/298 (0%) 

Query: 1 MLYQQIASNKRKTVVLLIVFFCLLAAIGAAVGYLVLGSYQFGLVLALIIGViyAVSMIFQ 60 

MLYQQI+ NK++TWLL+ FF LLA IGA+ GYL+L +Y GLVLAL+ IGVI YA SMIFQ 
Sbjct: 1 MLYQQISQNKQRTVVLLVGFFALLALIGASAGYLLLDNYAMGLVIALVIGViyATSMIFQ 60 

Query: 61 STOVVMSMNNAREVTEDEAPOTraiTO^ 120 
ST++VMSMNNAREVTE EAP +FHIVEDMAK+AQIPMPRVFI+ED SLHAFATGS P+HA 

Sbjct: 61 



Query: 121 AVAATTGLIAVMNREELEGVIGHEVSHIRNYDIRISTIAVALASAVTLISSIGSRMLFYG 180 

AVAATTGLL VMNREELEGVIGHE+SH1RNYDIRISTIAVALASAVT+ISSIG RML+YG 
Sbjct: 121 AVARTTGLLEV^EELEGVIGHEISHIRNYDIRISTIAVALASAVTVISSIGGRMLWYG 180 

. Query: 181 GG--RRRDDDREDGGNILVLIFSILSI.1IAPIAASLVQIAISRQREYLADASSVELTRNP 238 
GG R+RDD +D 1+ L+ S+LSL+LAPL ASL+QLAISRQREYLADASSVELTRNP 
Sbjct: 181 GGSRRQRDDGDDDVLRIITLLLSLLSLLIAPLVASL1Q1A1SRQREYLADASSVELTRNP 240 

Query: 239 QGMISALEKLDRSEPMGHPVDDASAAIiYINDPTKKEGLKSIiFYTHPPIADRIERLRHM 296 

QGMI ALEKL S+PM HPVDDASAALYIN+P KK SLF THPPI +RIERL++M 
Sbjct: 241 QGMIKALEKLQLSQPMKHPVDDASAALYINEPRKKRSFSSLFSTHPPIEER1ERLKNM 298 

A related GBS gene <SEQ ID 8847> and protein <SEQ ID 8848> were also identified. Analysis of th 
protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 10 

McG: Discrim Score: 9.61 

GvH: Signal Score (-7.S): -0.97 
Possible site: 25 

»> Seems to have a cleavable N-term signal seq. 

ALOM program count: 3 value: -11.30 threshold: 0.0 

INTEGRAL Likelihood =-11.30 Transmembrane 195 - 211 ( 190 - 221) 
INTEGRAL Likelihood =-11.09 Transmembrane 43 - 59 ( 31 - 62) 
INTEGRAL Likelihood = -3.61 Transmembrane 153 - 169 ( 153 - 174) 
PERIPHERAL Likelihood = 5.89 87 
modified ALOM score: 2.76 

*** Reasoning Step: 3 

Final Results 

bacterial membrane Certainty=0. 5522 (Affirmative) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

73.8/88.3% over 296aa 

imported 

SP 1 030795 1 PUTATIVE HEAT SHOCK PROTEIN HTPX. Insert characterized 

GP|2407215|gb|AAB70525.l| |AF017421 putative heat shock protein HtpX {Streptococcus 
gordonii} Insert characterized 

PIR|T48855|T48855 probable heat shock protein HtpX - Streptococcus gordonii Insert 
characterized 

ORF02338(301 - 1188 of 1488) 

SP|030795|HTPX_STRGC(1 - 297 of 297) PUTATIVE HEAT SHOCK PROTEIN 

HTPX.GP| 2407215 |gb|AAB70525.1 1 |AF017421 putative heat shock protein HtpX {Streptococcus 
gordonii }PIR I T48855 I T48855 probable heat shock protein HtpX [imported] - Streptococcus 
gordonii 
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%Match =44.0 

% Identity =73.7 %Similarity =88.2 

Matches =219 Mismatches = 34 Conservative Sub.s = 43 



5 141 171 201 231 261 291 321 351 

NFLPTSVI*fflOTIQL*CEIRNFPK*YCWKTIWQTKPILRNS*RRKRSAKSFL*LLIEKGERLLLYQQIASNKRKTVVLL 

:|::|||:Mhl II 
MLFEQIAANKRRTWFLL 
10 



381 411 441 471 501 531 561 591 

IVFFCLIAAIGAAVCm,VLGSYQFGLVLALIIGVIYAVSM 

: II III llll III : I l = = :hll|:|||: = llllll II III 11 = 1 = 1 III =111 = 1111 = 11111 
VAFFALIALIGAAAGYLmNSPLGGVIIAFIIGLIYAITMIFQSTEVVMSI^GARQVSEQEAPELYHIVQDMAMVAQIPM 
15 30 40 50 60 70 80 90 



621 651 681 711 741 771 801 831 

PRWIVEDDSLNAFATGSKPENAAVAATTGLIAWffiEELEGVIGE^^ 

111 = 111111 mini inininmi miinninniiiiiiiiiiiniiiinmmm 11 = 

20 PRWIVEDDSPNAFATGSNPENAAVAATTGLLRMINREELEGVIGHEVSHIRNYDIRISTIAVALASAITMISSVAGRMM 
110 120 130 140 150 160 170 



861 888 918 948 978 1008 1038 1068 

FYGGGRRRDDDREDGG-NILVLIFSILSLIIAPLAASLVQLA1SRQREYLADASSVELTRNPQGMISALEKLDRSEPMGH 

25 minim =i i mhimminnmimiiinmninniiiiiini 11 = 111 nn 

I^GGGPJJPJfflRDDDSGLGLLMLVFSLIAIILAPLAATLVQlAAISRQREFLADASSVELTRNPQGMIPJVLQKLDNSEPMH 
190 200 210 220 230 240 250 

1098 1128 1158 1188 1218 1248 1278 1308 

30 PvTDDASAALYINDPTKKEGLKSLFYTHPPIADRIERLRHM*SLTKRRVAI4?OT,FF*DiaCKT*YNMTYTlKGDGTCYLQ 

iiimmmi ii m imim = :i = mi i 

HVDDASAALYISDPKKKGGLQKLFYTHPPISERVERLRKM 
270 280 290 

SEQ ID 8848 (GBS179) was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell 
35 extract is shown in Figure 1 75 (lane 1 1 ; MW 58kDa). 

GBS179-GST was purified as shown in Figure 227, lane 5. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1571 

40 A DNA sequence (GBSxl665) was identified in S.agalactiae <SEQ ID 4855> which encodes the amino 
acid sequence <SEQ ID 4856>. Analysis of this protein sequence reveals the following: 

Possible site: 20 

»> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood =-15.44 Transmembrane 4 - 20 ( 1-27) 



- Final Results 

bacterial membrane Certainty=0. 7177 (Affirmative) « 

bacterial outside Certainty=0. 0000 (Not Clear) < £ 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < s 



The protein has homology with the following sequences in the GENPEPT database. 



Query: 1 MGTMILIAIIALWIWLIVAYNSLWSRMHTKESWSQIDVQLKRRNDLIPNLIETVKGYA 60 

M +1 IA+I + V+++I YNSLVR+RM T+E+WSQIDVQLKRRNDL+PNLIETVKGY 
Sbjct: 1 MSFIITIAVIWIVLWISVYNSLVRARMQTQEAWSQIDVQLKRRlffiLLPNLIETVKGYG 60 

Query: 61 AYEGKTLEKIAELRAQVAKANTPAEAMTASNELTRQLSSIIAVAENYPDLKANNSFVKLQ 120 
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YE TLEK+ +LRAQVA A+4-PA+AM AS+ LTRQ+S I AVAE+YPDLKAN +++KLQ 
Sbjct: 61 KyEQATLEKOTQLRAQVASASSPADAMKASDALTRQISCSIPAVAESYPDLKANENYLKLQ 120 

Query: 121 EELTNTENKISYSRQLYHTTTSNYNVKLETFPSNIVGKLFGFKPSQFLETPEEEKEVPKV 180 
5 EELTNTENKI SYSRQLYN+ NYNVKL+ FPSN++ +F F4P+ FL TPEEEK VPKV 

Sbjct: 121 EELTNTENKISYSRQLYNSVAGNYNVKLQAFPSNVIAGMFAFRPADFLSTPEEEKAVPKV 180 

Query: 181 SF 182 
F 

10 Sbjct: 181 DF 182 

A related DNA sequence was identified in S.pyogenes <SEQ ID 4857> which encodes the amino acid 
sequence <SEQ ID 4858>. Analysis of this protein sequence reveals the following: 

Possible site: 15 

15 

>» Seems to have a cleavable N-term signal seq. 

Final Results 

bacterial outside Certainty=0. 3000 (Affirmative) < suco 

20 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:AAC44350 GB:TJ66186 LemA [Listeria monocytogenes] 
25 Identities = 91/181 (50%) , Positives = 121/181 (66%) , Gaps = 2/181 (1%) 

Query: 5 LIILVVlGVLALWLMISYNSIjWSRMHTKEAWSQIDVQLKRRNDLIPJSnjIETVKGYASYE 64 

+1 + V+ +L L YNSLVK R E W+QIDVQLKRR DLIPNL+ETVKGYA +E 

Sbjct: 5 IIAIAVWILVXjlYFGliYNSLVICYRNRVDETWAQIDVQLKRRFDLIPI^vETVKGYAKHE 64 

30 

Query: 65 QKTFEKITDLRARVAN--ASTPCETMAASNELSKQVTSLFAVAENYPDLKANENFLKLQE 122 

++T ++ + R ++ A Q + A N LS + S+FA+ E YPDLKAN +F++LQ 
Sbjct: 65 KETLTQVIEARNKMMEVPADNRQGQIEADNMIiSGALKS I FALGEAYPDLKANTS FI ELQH 124 

35 Query: 123 ELTNTENKISYSRQLYNSTTSNYNLQLESFPSNIAGKLFGFKPSEFLQTPEAEKEVPKVEF 183 

ELT TENK++YSRQLYN+T YN +++S P+-NI KL F + L PE E+ PKVEF 
Sbjct: 125 ELTTTENKVAYSRQLYNTTVMTYNTKVQSVPTNIVAKLHNFTERDMLSIPEVERVAPJCVEF 185 

An alignment of the GAS and GBS proteins is shown below. 

40 Identities = 135/181 (74%) , Positives = 165/181 (90%) 

Query: 4 MILIAIIALFVIWLIVAYNSLVRSRMHTKESWSQIDVQLKRRNDLIPNLIETVKGYAAYE 63 

+ I++ ++ + +WL+++YNSLV+SRMHTra+WSQIDVQLKRRNDLIPNLIETVKGYA+YE 
Sbjct: 5 LIILVVLGVLALTOMISYNSLVKSWIHTKEAWSQIDVQLKRRITOLIPNLIETVKGYASYE 64 

45 

Query: 64 GKTLEKIAELRAQVAKANTPAE1AMTASNELTRQLSSILAVAENYPDLKANNSFVKLQEEL 123 

KT EKI +LRA+VA A+TP E M ASNEL++Q++S+ AVAENYPDLKAN +F+KLQEEL 
Sbjct: 65 QKrFEKITDLRARVANASTPQETMAASNELSKQVTSLFAVAENYPDLKANENFLKLQEEL 124 

50 Query: 124 TNTENKISYSRQLYNTTTSNYKVKLETFPSNIVGKLFGFKPSQFLETPEEEKEVPKVSFDF 184 

TNTENKISYSRQLYN+TTSNYN++LE+FPSNI GKLFGFKPS+FL+TPE EKEVPKV F+F 
Sbjct: 125 TNTENKI SYSRQLYNSTTSNYNLQLES FPSNI AGKLFGFKPSEFLQTPEAEKEVPKVEFNF 185 

A related GBS gene <SEQ ID 8849> and protein <SEQ ID 8850> were also identified. Analysis of this 
55 protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 0 
McG: Discrim Score: 14.53 
GvH: Signal Score (-7.5) : -3.19 
Possible site: 20 
60 >>> Seems to have an uncleavable N-term signal seq 

ALOM program count: 1 value: -15.44 threshold: 0.0 

INTEGRAL Likelihood =-15.44 Transmembrane 4- 20 ( 1- 27) 



WO 02/34771 



-1752- 



PCT/GB01/04789 



PERIPHERAL Likelihood = 8.86 146 
modified ALOM score: 3.59 

*** Reasoning Step: 3 

5 

Final Results 

bacterial membrane 
bacterial outside 
bacterial cytoplasm 

10 

The protein has homology with the following sequences in the databases: 

51.4/68.9% over 183aa 

Listeria monocytogenes 
EGAD | 149857 | LemA protein Insert characterized 
15 GP|l519287|gb|AAC44350.l| |U66186 LemA Insert characterized 

ORF01545(301 - 846 of 1152) 

EGAD| 149857 | 159923 (2 - 185 of 185) LemA protein {Listeria monocytogenes} 
GP| 1519287|gb|AAC44350.l| |U66186 LemA {Listeria monocytogenes} 
20 %Match =23.8 

%Identity =51.4 %Similarity =68.9 

Matches = 94 Mismatches = 56 Conservative Sub.s = 32 

42 72 102 132 162 192 222 252 

25 CFK*TSSLSVIAWLIFSFHSTRSLK*VSMCFFCLSVSVIPCSIRT^*NAWGVIVNLNFYIV**LYFITNTNNGNXJRTFL 

282 312 342 372 402 432 462 492 

I*RKLL*WKKCKGATTMGTMILIAIIALFVIWLIVAYNSLTOSRMHTKESWSQIDVQLKRRNDLIPNLIETVKGYAAYEG 
:| II'--. =--|: I 1 I I I : I I : I = I I I I I I I I I I I I I 1 I * I I I I I I I I 

30 MIGWIIAIAVWILVLIYFGLYNSLVICYRNRVDE'TIVAQIDVQLKRRFDLIPNLVETVKGYAKHEK 
10 20 30 40 50 60 

522 546 576 606 636 666 696 726 

KTLEKIAELRAQVAK--AOTPAEAMTASNELTRQLSSILAVAEOTPDLKANNSEVKLQEELTNTENKISYSRQLYNTTTS 
35 :|| :: | | :: : |: : | | |: | ||:|: | ||||||| ||::|| ||| | | | I = = I I I I I I I I I 

ETLTQVIEARNKWEVPADimQGQIEADlMLSGALKSIFALGEAYPDLKAOTSFIELQHELTTTENKVAYSRQLYOT'TVM 
80 90 100 110 120 130 140 

756 786 816 846 876 906 936 966 

40 NYNVKLETFPSNIVGKLFGFKPSQFLETPEEEKEVPKV'SFDF*LRRERGFCCINKLQVIREKQLSC*LSSSVF*QLLEQL 

II I::: hill II I I II 1= III I 

TYNTKVQSVPTNI VAKLHNFTERDMLS I PEVERVAPKVEF 
160 170 180 

45 SEQ ID 4856 (GBS42) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 5 (lane 2; MW 21.8kDa) and in Figure 168 (lane 5-7; MW 36kDa). It was also 
expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell extract is shown in Figure 13 
(lane 8; MW 46kDa). Purified Thio-GBS42-His is shown in Figure 244, lane 11. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
50 vaccines or diagnostics. 

Example 1572 

A DNA sequence (GBSxl666) was identified in S.agalactiae <SEQ ID 4859> which encodes the amino 
acid sequence <SEQ ID 4860>. This protein is predicted to be glucose inhibited division protein b (gidB). 
Analysis of this protein sequence reveals the following: 

55 Possible site: 47 

>» Seems to have no N-terminal signal sequence 



— - Certainty=0. 7177 (Affirmative) < suco 

Certainty=0 . 0000 (Not Clear) < suco 

--- Certainty=0. 0000 (Not Clear) < suco 



Final Results 
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bacterial cytoplasm Certainty=0 .243 0 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 10079> which encodes amino acid sequence <SEQ ID 
10080> was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 



Query: 5 MTPQAFYQVLIEHGITLTDKQKKQFETYFRLLVEWNEKINLTAITDKEEVYLKHFYDSIA 64 

M + F L E GI+L+ +Q +QFE Y+ +LVEWNEKINLT+ 1 T+K+EVYLKHFYDS I 
Sbjct: 1 ^IEEFTSGLAEKGISLSPRQLEQFELYYDMLVEWNEKINLTSITEKKEVYLKHFYDSIT 60 

Query: 65 PILQGYID-NSPLSILDIGAGAGFPSIPMKILYPEIDITIIDSLNKRINFLNILANELEL 123 

Y+D N +1 D+GAGAGFPS+P+KI +P + +TI+DSLNKRI FL L+ L+L 
Sbjct: 61 AAF--YVDFNQVNTICDVGAGAGFPSLPIKICFPHLHVTIVDSLNKRITFLEKLSEALQL 118 

Query: 124 SGVHFFHGRAEDFGQDRVFRAKFDIVTARAVAKMQVIAELTIPFLKVNGRLIALKAAAAE 183 

F H RAE FGQ + R +DIVTARAVA++ VL+EL +P +K NG +ALKAA+AE 
Sbjct: 119 EOTTFCHDRAETFGQRKDVRESYDIOTARAVARLSVLSELCLPLVKKNGLFVALKAASAE 178 

Query: 184 EELISAEKALKTLFSQVTVNKNYKLP-NGDDRNITIVSKKKETPNKYPRKAGTPNKKPL 241 

EEL + +KA+ TL ++ ++KLP DRNI ++ K K TP KYPRK GTPNK P+ 
Sbjct: 179 EELNAGKKAITTLGGELENIHSFKEjPIEESDRNIMVIRKIKNTPICKYPRKPGTPNKSPI 237 

A related DNA sequence was identified in S.pyogenes <SEQ ID 486 1> which encodes the amino acid 
sequence <SEQ ID 4862>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .4862 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 170/237 (71%) , Positives = 202/237 (84%) 

MTPQAFYQVLIEHGITLTDKQKKQFETYFRLLVEWNEKINIjTAITDKEEVYLKHFYDSIA 64 
MTPQ FY+ LEG +L+ KQK+QF+TYF+ LVEWN KINLTAIT++ EVYLKHFYDSIA 
MTPQDFYRTLEEDGFSLSSKQKEQFDTYFKSLVEWNTKINLTAITEENEVYLKHFYDSIA 6 0 



Query: 


5 


Sbjct: 


1 


Query: 


65 


Sbjct: 


61 


Query: 


125 


Sbjct: 




Query: 


185 


Sbjct: 


181 



PILQG++ N P+ +LDIGAGAGFPS+PMKIL+P +++TIIDSLNKRI+FL +LA EL L 



N +Y+LPNGD R ITIV KKKETPNKYPRKAG PNKKPL 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 
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Example 1573 

A DNA sequence (GBSxl667) was identified in S.agalactiae <SEQ ID 4863> which encodes the amino 
acid sequence <SEQ ID 4864>. Analysis of this protein sequence reveals the following: 

i cleavable N-term signal seq. 



Final Results 

bacterial outside Certainty=0. 3 000 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens i 



Example 1574 

A DNA sequence (GBSxl668) was identified in S.agalactiae <SEQ ID 4865> which encodes the amino 
acid sequence <SEQ ID 4866>. This protein is predicted to be v-type sodium ATP synthase subunit j. 
Analysis of this protein sequence reveals the following: 



Possible site: 45 



3 have a cleavable N-term signal seg. 



INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 



Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 



Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 



• 387 ( 362 - 

• 216 ( 190 - 

- 441 { 423 - 

• 343 ( 325 - 

- 97 ( 81 - 
■ 156 ( 139 - 

• 71 ( 53 - 

• 263 ( 247 - 

• 181 ( 165 - 181) 



- Final Results - 

bacterial i 
bacterial outside - 
bacterial cytoplasm - 



-- Certainty=0. 5055 (Affirmative) . 
-- Certainty=0. 0000 (Not Clear) < i 
■- Certainty=0. 0000 (Not Clear) < : 



A related GBS nucleic acid sequence <SEQ ID 10081> which encodes amino acid sequence <SEQ ID 
1 0082> was also identified. 



The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAA04279 GB:D17462 Na+ -ATPase subunit J [Enterococcus hirae] 
Identities = 170/461 (36%) , Positives = 262/461 (55%) , Gaps = 28/451 (6%) 

Query: 12 KTMSVARKLSISFIAVILLGSILLSLPIFQYANAPKTHYIDHLFTTVSMVCVTGLSVPPI 71 
K +S + ++ F +IL G LL+LP F + TH+ID LFT S VCVTGL+ 
KRLSPVQLIAAGFFILILFGGSLLTLPFFS-RSGESTHFIDALFTATSAVCVTGLTTLNT 68 

: 72 SKVYNGWGQIVAILLMQTGGLGLOTLMSLSYYTLRRKMSLNDQTLLQSAITYNSSTDLKK 131 
++ +N GQ + + L++ GGLG + + L + ++K+S + + +L+ A+ + + K 

AEHVmSAGQFLIMTLIEIGGLGFMMIPILFFAIAKKKISFSfKIVLECEALNLEEMSGVIK 128 

: 132 YLYMIFKVTLTLEVLAASILAIDFIPRFGLGHGIFNSIFLAVSAFCNAGFDNLEATSLAQ 191 
+ I K + ++V+ A L++ FIP FG GI+ SIF AVS + FCNAGFD L + LA 
129 LMIYILKFAWIQVIGAVALSWFIPEFGWAKGIWFSIFHAVSSFCNAGFDLLGDSLLAD 188 
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Query: 192 FKENPLVNIIVCFLIISGGLGFAVWKDLIEATIQTSEKGPKLIKTFPKRLSHHSKLVLKT 251 

+ N + ++V LII+GGLGF VW+D++ + H+ K+++ HSK+ L 

Sbjct: 189 -QTIWLIMWSALIIAGGLGFIVWRDIL SYHR VKKITLHSKVALSV 234 

Query: 252 TTIILLTGTLLSWLLEFGNFRTIANLSLPKQLMVSFFQTVTMRTAGFSTIDYTQTDFATN 311 

T ++L+ G +L +L+ N T+ + ++L +FF +VT RTAG+ +IDY Q A 
Sbjct: 235 TALLLIGGFIL-FLITERNGLTLVKGTFTERLAOTFFMSVTPRTAGYYSIDYLQMSHAGL 293 

Query: 312 LVYIIQMLIGGAPGGTAGGFKVTVIAILLLLFKAELSGQSQVTFHYRTIPSSIIKQTLSI 371 

++ + M IGG G TAGG K T + ILL+ A G+++ RTI + + L 
Sbjct: 294 1LTMFLMYIGGTSGSTAGGLKTTTLGILLIQMHAMFKGKTRAEAFGRTIRQAAV LRA 350 



- -LISGYLLLLELNPHIDPFS LFFEASSALATVGVTMNTTNQLTLGGRI 425 

LT FF+ h +++L + I S + FE SA TVG+TM T LTL G++ 
Sbjct: 351 LTLFFVTLSLCWAIMVLSVTET1PKTSGIEYIAFEVFSAFGTVGLTMGLTPDLTLIGKL 410 

Query: 426 VIMFLMF1GRVGPITVLLSILQK KEK3IHYAETEIILG 463 

VI+ LM+IGRVG +TV+LS+L K E YE I+LG 
Sbjct: 411 VI I SLMYIGRVGIMTWLSLLVKANRAEANYKYPEES IMLG 451 

A related DNA sequence was identified in S.pyogenes <SEQ ID 4867> which encodes the amino acid 
sequence <SEQ ID 4868>. Analysis of this protein sequence reveals the following: 



INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 



Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 



11 



Transmembrane 



- 105 ( 81 - 

- 216 ( 196 - 

• 156 ( 139 • 

• 71 ( 53 - 

• 263 ( 246 ■ 

- 409 ( 393 ■ 

• 181 ( 165 - 



264) 
40S) 
181) 



• Final Results 

bacterial membrane - 
bacterial outside - 
bacterial cytoplasm - 



■ Certainty=0. 7050 (Affirmative) < succ; 

■ Certainty=0 . 0000 (Not Clear) < suco 
• Certainty= 0.0000 (Not Clear) < suco 



The protein has homology with the following sequences in the databases: 



Query: 6 MKESFIKSLSOTQRLTFSFAIVILIGTLLLSMPFTHYQNGPNTVYLDHFFNWSMVCVTG 65 

MK+ K LS Q + F I+IL G LL++PF ++G +T ++D F S VCVTG 
Sbjct: 4 MKKRVRKRLSPVQLIAAGFFILILFGGSLLTLPFFS-RSGESTHFIDALFTATSAVCVTG 62 

Query: 66 LSWPVAEVYNGIGQTIAMALMQIGCLGLVTLIAVSTFAL-KRKMRLSDQTLLQSALNRG 124 

L+ + AE +N GQ + M L++IG LG + +1 + FA+ K+K+ S + +L+ ALN 
Sbjct: 63 LTTLWTAEHWNSAGQFLIMTLIEIGGLGFM-MIPILFFAIAKXKISFSMRIVLKFALNLE 121 

Query: 125 DSKDLKHYLFFAYKVTFSLEAFAAIVIMIDFIPRFGWKNGIFNSIFLAVSAFCNAGFDNL 184 

+ + + + K ++ A+ + + FIP FGW GI+ SIF AVS + FCKAGFD L 
Sbjct: 122 EMSGVIKLMIYILKFAWIQVIGAVALSWFIPEFC-WAKGIWFSIFHAVSSFCNAGFDLL 181 



Query: 245 SRLVLQTTAVILFLGTFLTWFLEKDNSKTIANFSLHQQLMVSFFQTVTMRTAGFATISYN 304 

S++ L TA++L +G F+ + + + N T+ + ++L +FF +VT RTAG+ +1 Y 
Sbjct: 228 SKVALSVTALLL-IGGFILFLITERNGLTLVKGTFTERLANTFFMSVTPRTAGYYSIDYL 286 

Query: 305 DT1APTNILYMIQIWIGGAPGGTAGKIKOTTAAITFLLFKAELSGQSEVTFRMIIANKT 364 
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IL M M IGG G TAGG+K TT I + A G++ R I 

Sbjct: 287 QMSHAGLILTMFLMYIGGTSGSTAGGLKTTTLGILLIQMHAMPKGKTRAEAFGRTIRQAa 346 

Query: 365 I KQTMTVLI FFFAVLMIGFILLLSVEPHIAPIP LLFESISAIATVGVSMDLTPQLS 420 

+ + +T L F L + I++LSV I + FE SA TVG++M LTP L+ 

Sbjct: 347 VLRALT-LFFVTLSLCWAIMVLSVTETIPKTSGIEYIAFEVFSAFGTVGLTMGLTPDLT 405 

Query: 421 TAGRLIVIVLMFVGRVGPITVLISLI QRKEKTIQYATTDILVG 463 

G+L++I LM++GRVG +TV++SL+ R B +Y I++G 
Sbjct: 406 LIGKLVIISLMYIGRVGIMTWLSLLVKAMRAEANYKYPSSSIMLG 451 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 275/462 (59%), Positives = 351/462 (75%), Gaps = 1/462 (0%) 

Query: 2 GASMKHFFDYKTMSVARKLS I SFIAVILLGSILLSLPI FQYANAPKTHYIDHLFTTVSMV 61 

G +MK F K++SV ++L+ SF VIL+G++LLS+P Y N P T Y+DH F VSMV 
Sbjct: 3 GGNMKRSF-IKSLSVTQRLTFSFAIVILIGTLLLSMPFTHYQNGPNTVYLDHFFtWSMV 61 

Query: 62 OTTGLSVFPISKVYNGWGQIVAILLMQTGGLGLVTLMSLSYYTLRRKMSLNDQTLLQSAI 121 

CVTGLSV P+++VYNG GQ +A+ LMQ G LGLVTL+++S + L+RKM L+DQTLLQSA+ 
Sbjct: 62 CTO3LSWPVAEVYNGIGQTIAMALMQIGCLGLVTLIAVSTFALKRKMRLSDQTLLQSAL 121 

Query: 122 TYNSSTDLKKYLYMlFKvTLTLEVLAASILAIDFIPRFGLGHGIFNSIFIAVSAFCNAGF 181 

S DLK YL+ +KVT +LE AA ++ IDFIPRFG +G1 FNS I FLAVSAFCNAGF 
Sbjct: 122 NRGDSKDLKHYLFFAYKVTFSLEAFAAI VIMIDFI PRFGWKNGI FNS I FLAVSAFCNAGF 181 

Query: 182 DNLEATSLAQFKLNPLVNIIVCFLIISGGLGFAVWKDLIEATIQTSHKGPKLIKTFPKRL 241 

DNL ++SL F LNP +N+I+ FLI I SGGLGFAVW DL A + + P ++L 
Sbjct: 182 DNLGSSSLKDFMIjNPTI^IITFLIISGI^FAVWVDI/SVAFKKYFFERPHCYGATFRKL 241 

Query: 242 SNHSKLVLKTTTIILLTGTLLSWLLEFGNFRTIANLSLPKQLMVSFFQTVTMRTAGFSTI 301 
SN S+LVL+TT +IL GT L+W LE N +TIAN SL +QLMVSFFQTVTMRTAGF4TI 
242 St^SIUjVli(^AVILFIiGTFLTWFLEKDNSKTIANFSLHQQLMVSFFC5^raiRTAGFATI 301 



Sbjct: 



! DYTQTDFATNLVYIIQMLIGGAPGGTAGGFKVTVIAILLLLFKAELSGQSQVTFHYRTIP 361 
Y T TN+ + Y+ IQM+ IGGAPGGTAGG KVT AI LLFKAELSGQS+VTF R I 
Sbjct: 302 SYMDTIAPTNILYMIQMVIGGAPGGTAGGIKVTTAAITFLLFKAELSGQSEVTFRNRIIA 361 

Query: 362 SSIIKQTLSILTFFFIILISGYLLLLELNPHIDPFSLFFEASSALATVGVTMNTTNQLTL 421 

+ 1KQT+++L FFF +L+ G++LLL + PHI P L FE+ SA+ATVGV+M+ T QL+ 
Sbjct: 362 NKTIKQTMTVLIFFFAVLMIGFILLLSVEPHIAPIPLLFESISAIATVGVSMDLTPQLST 421 

Query: 422 GGRIVIMFLMFIGRVGPITVLLSILQKKEKEIHYAETEIILG 463 

GR++++ LMF+GRVGPITVL+S++Q+KEK I YA T+I++G 
Sbjct: 422 AGRLIVIVLMFVGRVGPITVLISLIQRKEKTIQYATTDILVG 463 

A related GBS gene <SEQ ID 885 1> and protein <SEQ ID 8852> were also identified. Analysis of tl 
protein sequence reveals the following: 



: 0.64 



Lipop: Possible site: -1 
McG: Discrim Score: 
GvH: Signal Score (-7.5) 

Possible site: 45 
>>> Seems to have a cleavable N- 
ALOM program count: 9 value: ■ 
INTEGRAL Likelihood =-] 
INTEGRAL Likelihood = ■ 
Likelihood = • 
Likelihood = - 
Likelihood = ■ 
Likelihood = ■ 
Likelihood = ■ 
Likelihood = ■ 
Likelihood = • 
Likelihood = 
modified ALOM score: 2.53 



INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 



cm signal seq. 

.14 threshold: 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 



387 ( 362 ■ 

216 ( 190 • 

441 ( 423 • 

343 ( 325 • 

97 ( 81 - 



217) 
446) 
349) 




WO 02/34771 



PCT/GB01/04789 



-1757- 



*** Reasoning Step: 3 

Final Results 

bacterial membrane Certainty=0. 5055 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm — Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

0RF02334(334 - 1689 of 1989) 

KGAD|2215l|22827(10 - 451 of 451) v-type sodium ATP synthase subunit j {Enterococcus hirae} 
SP|P43440|NTPJ_ENTHR V-TYPE SODIUM ATP SYNTHASE SUBUNIT J (EC 3.6.1.34) (NA(+) - 
TRANSLOCATING ATPASE SUBUNIT J). GP 1 487282 |dbj | BAA04279 . 1 ] | D17462 Na+ -ATPase subunit J 
{Enterococcus hirae} 
%Match =18.8 

%Identity =38.5 %Similarity =60.4 

Matches = 170 Mismatches = 166 Conservative Sub.s = 97 

186 216 246 276 306 336 366 396 

TIFTSNCK*KL*OT*W**PKYHNR*QEKRNA**IPS*SWYSKQEAFVKXGASMKHFFDYKTMSVARKLSISFIAVILLGS 

I =| ■ == I =11=1 
MTIMKKRVRKRLSPVQLIAAGFFILILFGG 
10 20 30 

426 456 486 516 546 576 606 636 

ILLSLPIFQYANAPKTHYIDHLFTTVSMVCOTGLSVFPISKVYN^ 

11=11 I = 11=11 III I lllill: : :: :| || : : |:: ||||:= : | : ::|:|:: 
SLLTLPFFSKSGES-THFIDAI^TATSAVCOTGLTTIOTAEHWNSAGQFLIMTLIEIGGLGFMMIPILFFAIAKKKISFS 



DQTLLQSAITYNSSTDLKKYLYMIFKVTLTLEVIAASIIAIDFIPRFGLGHGIFNSIFLAVSAFCNAGFDNLEATSLAQF 

= =1= 1= = = i = 1 = 1 = = = h I l» III II 11= III llhlllllll I II 

MRIVLKEai^EEMSGVIKLMIYILKFAWIQVIGAVALSWFIPEFGWAKGIWFSIFHAVSSFraAGFD-LLGDSLIAD 
120 130 140 150 160 170 180 

906 936 966 996 1026 1056 1086 1116 

KraPLWIIVCFLIISGGLGFAWKDLIEaTIQTSHKGPKLIKTFPKRLSNHSKLVLKTTTIILLTGTLLSWLLEFGNFR 
=1 = ::| 111=11111 ll=|: I l=== 111= I I =11 I == =1= I 

QTNvYLIMWSALI IAGGLGFI VWRDI LSYHRVKKITLHSKVALSVTA- LLLIGGFILFLITERNGL 

200 210 220- 230 240 250 

1146 1176 1206 1236 1266 1296 1326 1356 

TIANLSLPKQLIWSFFQTVTMRTAGFSTIDYTQTDFATNLVYIIQMLIGGAPGGTAGGFKVTVIAILLLLFKAELSGQSQ 
1= == = = l =11 =11 1111= =111 I I == = I. Ill I 1111 = 1 I = 111= I = l = = = 
TLVKSTFTERLAmTFMSVTPRTAGYYSIDYLQMS:™ 

270 280 290 300 310 320 330 

1386 1416 1461 1491 1518 1548 1578 

VTFHYRTIPSSIIKQTLSILTFFFIIL ISGYLL-LLELNPHIDPFS-LFFEASSALATVGVTMNTTNQLTLGGRIV 

III = = I 11=11= I ===1=11 =11 11= 111=11 I III l==l 

AEAFGRTIROAAV LRALTLFFVTLSLCWAIMVLSvTETIPKTSGIEYIAFEVFSAFGTVGLTMGLTPDLTLIGKLV 

350 360 370 380 390 400 410 

1608 1638 1659 1689 1719 1749 1779 1809 

IMFLMFIGRVGPITVLLSILQK KEKEIHYAET3IILG*KRSFMKTKIIGVLGLGIFGQTLAQELSNFEQDVIAIDSN 

1= h II =11=11=1 I IN l=|| 

1 1 SLMYIGRVGIMTWLSLLVKANRAEANYKYPEES IMLG 
430 440 450 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 
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Example 1575 

A DNA sequence (GBSxl669) was identified in S.agalactiae <SEQ ID 4869> which encodes the amino 
acid sequence <SEQ ID 4870>. This protein is predicted to be TrkA (ktrA). Analysis of this protein 
sequence reveals the following: 

a cleavable N-term signal seq. 

Final Results 

bacterial outside Certainty=0 . 3000 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC46144 GB:AF001974 putative TrkA [Thermoanaerobacter 
ethanolicus] 

Identities = 59/177 (38%), Positives = 110/177 (61%), Gaps = 2/177 (1%) 

Query: 8 VLGLGIFGQTLAQELSNFEQDVIAIDSNPEN--VQAVAEWTKAAIGDITDLAFLKHIGI 65 

V+GLG FG +LA+ L DV+ ID + E VQA+ +VT A D TD LK + + 

Sbjct: 6 VIGLGSFGISIAKTLYEMGNDVLVIDEDEEEELVQAMNGLVTHAVRADATDEWIjKSIjRV 65 

Query: 66 SDCDTVI IATGNSLESSVLAVMHCKKLGVPQVIAKARNL 1 /YEE VLYEIGADLVI SPERES 125 

+ D I+A G ++ESS++ M K+LGV VIAKA N ++ VLY++GAD V+ PE++ 
Sbjct: 66 KNFDVAIVAIGKNMESSIMVTMLVKELGVKYVIAKAHNELHARV1YKVGADRVVM 125 

Query: 126 GQOTAANL^KITDVFQIESDISVIEFKIPKSWVGKTVEQIJSrERHKFDimiGIRK 182 

G VA N+ + + D+ + + S+ E + W GKT++++N+R K+ IjN++ ++K 
Sbjct: 126 GIRVARNVFSSNLIDrjIEFSKEYSIAEILPIEEWFGKTLKEINvREKYGLNWAVKK 182 

A related DNA sequence was identified in S.pyogenes <SEQ ID 4715> which encodes the amino acid 
sequence <SEQ ID 4716>. Analysis of this protein sequence reveals the following: 

i uncleavable N-term signal seq 

Final Results 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 132/221 (59%) , Positives = 176/221 (78%) 

■ Query: 1 MKTKIIGVLGLGIFGQTLAQELSNFEQDVIAIDSNPENVQAVAEWTKAAIGDITDLAFL 60 
+K K +GVLGLGIFG+T+A+ELSNF+QDVIAID -t-V+ VA++VTKAA+GDITD FL 
Sbjct: 2 LKRKTOGVLGLGIFGRWARELSNFDQDVIAIDIRESHVK3VADLVTKAAVGDITDKEFL 61 

Query: 61 KHIGISDCDTVIIATGNSLESSVLAVMICKKLGVPQVIAKARNLVYEEVLYEIGADLVIS 120 

+GI CDTV+IA+GN+LESSVLAVMHCKKLGVP +IAKA+N ++EEVLY IGA VI + 
Sbjct: 62 IAVGIEHCDTWIASGNNLESSVIAVTffiCKKLGVPTIIAKAKNKIFEEVLYGIGATKVIT 121 

Query: 121 PERESGQNVAANLMRNKITDVFQIESDISVIEFKIPKSWGKlVEQIiNIRHKFDLNLIGI 180 

PER+SG+ VA+NL+R I + +E IS+IEF IPKSW G+++ +L++R K++LN+IG+ 
Sbjct: 122 PERDSGKRVASNLLRRHIESIIYLEHGISKIEEVIPKSWEGQSLSELDVRRKYELNVIGM 181 

Query: 181 RKAKNKPVDTEVPINSPLEEGIILVAIANSDAFQRYDYLGY 221 

R+ + K +DT V PLE I+VAIAN F+++DYLGY 
Sbjct: 182 RQKEVKTLDTNVKPFEPLEPNTIIVAIANDKTFEKFDYLGY 222 
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A related GBS gene <SEQ ID 8853> and protein <SEQ ID 8854> were also identified. Analysis of this 
protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 3 
McG: Discrim Score: 5.14 
5 GvH: Signal Score (-7.5): -0.860001 

Possible site: 19 
»> Seems to have a cleavable N-term signal seq. 
ALOM program count: 0 value: 1.06 threshold: 0.0 
PERIPHERAL Likelihood = 1.06 192 
10 modified ALOM score: -0.71 



*** Reasoning Step: 3 

Final Results 

bacterial outside --- Certainty=0 .3000 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 .0000 (Not Clear) 

The protein has homology with the following sequences in the databases: 

38.0/61.6% over 182aa 

Thermoanaerobacter 

ethanolicus 

GP | 2581796 | putative TrkA Insert characterized 
ORF02030(322 - 864 of 1269) 

GP| 2581796 |gb | AAC46144 . 1 1 |AF001974 (6 - 188 of 195) putative TrkA {Thermoanaerobacter 

ethanolicus} 

%Match =15.5 

%Identity =37.9 %Similarity = 
Matches =69 Mismatches =69 

60 90 120 150 180 210 240 270 

LISGYLLLLEMPHIDPFSLFFEASSALATVGVTMNTTNQLTLGGR1VIMFLMFIGRVGPITVLLSILQKKEKEIHYAET 

300 330 360 390 444 474 504 

EIILG*KRSFMKTKIIGVLGLGIFGQTLAQELSNFEQDVIAIDSNPEN--VQAVAEWTKAAIGDITDLAFLKHIGISDC 
I hill II :||= I Ih II = I 111= HI I I II II : = = 

MKQFWIGLGSFGISIJUCTLYEMGNDVLVIDEDEEEELVQAMNGLVTHAvRADATDENVLKSLRVKNF 



DTVIIATGNSLESSVIAvMHCKKLGVPQVIAKARNLVXEEVLYEIGADLVISPERESGQNVAANLMRNKITDVFQIESDI 
I |:| | ::|||:: | |:||| I I I I I I ■ I I I = = I I I I : I I = = I II 1= = = 1= :, = 
DVAIVAIGKMMESSIMOTMLVKELGVKYVIAKAHNELHARVLYKTO 



SVIEFKIPKSOTGKTVEQLNIRHKFDLNLIGIRKAKNKPVDTEVPINSPLEEXIILVAIANSDAFQRYDYLRYFY*RK*K 
|s | : | |||::::|:| |: ||:: ::| :: : : 
SIAEILPIEEWFGKTLKEIIJWEKYGLNVVAVKKFNDEIIVSPGAGL 
160 170 180 190 

SEQ ID 8854 (GBS57) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 19 (lane 6; MW 26kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 21 (lane 11; MW 51.1kDa) and in 
Figure 183 (lane 9 & 10; MW 51kDa). 

The GBS57-GST fusion product was purified (Figure 99A; see also Figure 195, lane 8) and used to 
immunise mice (lane 1 product; 20ug/mouse). The resulting antiserum was used for Western blot (Figure 
99B), FACS (Figure 99C ), and in the in vivo passive protection assay (Table III). These tests confirm that 
the protein is immunoaccessible on GBS bacteria and that it is an effective protective immunogen. 
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Based on this analysis, it was predicted that these proteins and their epitopes could he useful antigens for 
vaccines or diagnostics. 

Example 1576 

A DNA sequence (GBSxl670) was identified in S.agalactiae <SEQ ID 4871> which encodes the amino 
5 acid sequence <SEQ ID 4872>. Analysis of this protein sequence reveals the following: 



Possible site: 40 





have no N-term: 




signal sequence 










INTEGRAL 


Likelihood = 


11 


62 Transmembrane 


73 


89 


( 68 


96 


INTEGRAL 


Likelihood = 


11 


3 0 Transmembrane 


254 


270 


( 248 


274 


INTEGRAL 


Likelihood = 




73 Transmembrane 


127 


143 


{ 124 


144 


INTEGRAL 


Likelihood = 




19 Transmembrane 


50 


66 


( 47 


67 


INTEGRAL 


Likelihood = 


-3 


29 Transmembrane 


25 


41 


( 25 


45 



Final Results 

15 bacterial membrane Certainty=0 . 5649 (Affirmative) < succ-. 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0 . 0000 (Not Clear) < suco 



A related GBS nucleic acid sequence <SEQ ID 8855> which encodes amino acid sequence <SEQ ID 8856> 
20 was also identified. Analysis of this protein sequence reveals the following: 

Lipop Possible site: -1 Crend: 9 
McG: Discrim Score: -10.49 
GvH: Signal Score (-7.5): -1.14 
Possible site: 40 
25 »> Seems to have no N- terminal signal sequence 

ALOM program count: 5 value: -11.62 threshold: 0.0 

INTEGRAL Likelihood =-11.62 Transmembrane 73 - 89 ( 68 - 96) 
INTEGRAL Likelihood =-11.30 Transmembrane 254 - 270 ( 248 - 274) 
INTEGRAL Likelihood = -4.73 Transmembrane 127 - 143 ( 124 - 144) 
30 INTEGRAL Likelihood = -4.19 Transmembrane 50 - 66 ( 47 - 67) 



3.76 



201 



* Reasoning Step: 3 

— Final Results 

bacterial membrane 
bacterial outside 
bacterial cytoplasm 



— Certainty=0. 5649 (Affirmative) < 
-- Certainty=0. 0000 (Not Clear) < i 

— Certainty=0. 0000 (Not Clear) < t 



The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB13178 GB:Z99110 ykoC [Bacillus subtilis] 
Identities = 61/226 (26%) , Positives = 108/226 (46%) , Gaps = 12/226 (5%) 

Query: 49 FLIWSLGSLVLFRLAKIKWQQVSFVMTLWVFAVLNIIMVYLFAPHYGDKIYGSSSLLL 108 

F 1+4 G L+ + KW + + F +L V+ A K+ + L 

Sbjct: 36 FYIIIVAGVLLAAGIPLKKW LLFTIPFLILAFGCVWTAAVF- -GKVPTTPDNFL 87 

Query: 109 KGIGPYDVTSQELFYLFNLILKYFCTVPLALLFLMTTNPSQFASSL-NQLGLSYKIAYAV 167 

GP + S + +L + C L+++F+ TT+P F SL Q LS K+AY V 
Sbjct: 88 FQAGPISINSDNVSVGISLGFRILCFSALSMMFVFTTDPILFMLSLVQQCRLSPKLAYGV 147 



Query: 168 SLTLRYIPDVQEEFYTIRRAQEARGIELSKKSNLVARIKGNLQIVTPLIFSSLERIDTVA 227 
R++P +++E I++A + RG + +S ++ +1 + PL+ S++ + + A 
55 Sbjct: 148 IAGFRFLPLLKDEVQLIQQAHKIRGG— AAESGIINKISALKRYTIPLLASAIRKAERTA 205 



Query: 228 TAMELRRFGKNKRRTWYSKQSLEKSDIVLIILALASLFVSLYLIHL 273 

AME + F ++ RT+Y S+ 4- D V L L LF +L+ L 
Sbjct: 206 LAMESKGFTGSRNRTYYRTLSvNRRDWVFFCLVLL-LFAGSFLVSL 250 
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Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1577 

A DNA sequence (GBSxl671) was identified in S.agalactiae <SEQ ID 4873> which encodes the amino 
acid sequence <SEQ ID 4874>. This protein is predicted to be cobalt ABC transporter, ATP-binding protein 
(cbiO). Analysis of this protein sequence reveals the following: 

Possible site: 49 

>>> Seems to have no N- terminal signal sequence 

INTEGRAL Likelihood = -1.91 Transmembrane 436 - 452 ( 435 - 452) 

Final Results 

bacterial membrane Certainty=0. 1765 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB13179 GB:Z99110 similar to cation ABC transporter 
(ATP-binding protein) [Bacillus subtilis] 
Identities = 151/483 (31%) , Positives = 248/483 (51%) , Gaps = 19/483 (3%) 

Query: 8 KDFTFQyDVQSEPTLKGINLSIPKGEKVLILGPSGSGKSTLGHCLNGIIPNTHKGQYSGI 67 

+ 4F Y+ +P +1+ + KGE VL+LGPSG GKS+L CLNG+ P G SG 
Sbjct: 11 EQLSFSYEEDEKPVFQDISFELQKGECVLLLGPSGCGICSSLALCLNGLYPEACDGIQSGH 70 

Query: 68 FTINHKNAFDLSIYDK-SHLVSTVI^DPrjGQFIGLTVAEDIAFALENDVVAQEEMASIVE 126 

+ K D + + V QDPD QF LTV ++IAF LEN + +EEM + 

Sbjct: 71 VFLFQKPVTDAETSETITQHAGWFQDPDQQFCKLTVEDEIAFGLENLQIPKEEMTEKIN 130 

Query: 127 MWAKRLEIAPLLSKRPQDLSGGQKQRVSLAGVLVDDSPILLFDEPLANLDPQSGQDIMAL 186 

+L I L K LSGGQKQ+V+LA +L + +++ DEP + LDP S ++ + L 
Sbjct: 131 AVLGK1RITHLKEKMISTLSGGQKQKVAIACILAMEPELIILDEPTSLLDPFSAREFVHL 190 

Query: 187 VDRIHQEQDATTIIIEHRLED- -VFYERVDRWLFSDGQIIYNGEPDQLL--KTNFLSEY 242 

+ + +E+ + ++IEH+L++ + ER +VL G+ +G L + L + 
Sbjct: 191 MKDLQREKGFSLLVIEHQLDEWAPWIERT--IVLDKSGKKALDGLTKNLFQHEAETLKKL 248 

Query: 243 GIREPLYISALKNLGYDFEKQNTMTSIDDFDFSELLIPKMRALDLDKHTDKXLSVQHLSV 302 

GI P + L F M + + K +A + +L V LS 

Sbjct: 249 GIAIPKVCHLQEKLSMPFTLSKEMLFKEPIPAGH--VKKKKA PSGESVLEVSSLSF 302 

Query: 303 SYDLENNTLDDVSFDLYKGQRLAIVGKNGAGXSTIjAKALCQFI - PNNATLIYNNEDVSQD 361 

+ + D+SF L +G A+VG NG GKSTL L + P +++++ + + 

Sbjct: 303 ARG-QQAIFKDISFSLREGSLTALVGPNGTGKSTLLSVLASLMKPQSGKILLYDQPLQKY 361 

Query: 362 SIKERAERIGYVLQNPNQMISQAMVFDEVALGLRLRGFSDNDIESRVYDILKVCGLYQFR 421 

KE +R+G+V QNP V+DE+ G + ++ + E + +L+ GL 
Sbjct: 362 KEKELRKRMGFVFQNPEHQFVTDTVYDELLFGQK ANAETEKKAQHLLQRFGLAHLA 417 

Query: 422 NWPISALSFGQKKRVTIASILILNPEVIILDEPTAGQDMKHYTEMMSFLDKLSCDGHTIV 481 

+ A+S GQK+R+++A++L+ + +V++LDEPT GQD + EM + ++ +G ++ 
Sbjct: 418 DHHPFAISQGQKRRLSVATMLMHDVIO/LLLDEPTFGQDARTAAECMEMIQRIKAEGTAVL 477 

Query: 482 MIT 484 
MIT 

Sbjct: 478 MIT 480 

There is also homology to SEQ ID 4416. 

SEQ ID 4874 (GBS424d) was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total 
cell extract is shown in Figure 146 (lane 2 & 4; MW 77kDa) and in Figure 239 (lane 10; MW 77kDa). It 
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was also expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell extract is shown in 
Figure 146 (lane 5 & 7; MW 52kDa) and in Figure 182 (lane 4; MW 52kDa). Purified GBS424d-His is 
shown in Figure 241, lanes 6 & 7. Purified GBS424d-GST is shown in Figure 246, lane 12. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1578 

A DNA sequence (GBSxl672) was identified in S.agalactiae <SEQ ID 4875> which 
acid sequence <SEQ ID 4876>. Analysis of this protein sequence reveals the following: 

Possible site: 58 



Seems to have no N-terminal signal sequence 










INTEGRAL Likelihood = -8.12 Transmembrane 


39 


- 55 


35 


63) 


INTEGRAL Likelihood = -3.98 Transmembrane 


72 


- 88 


71 


90) 


INTEGRAL Likelihood = -3.66 Transmembrane 


108 


- 124 


106 


127) 


INTEGRAL Likelihood = -2.34 Transmembrane 


182 


- 198 


181 


198) 


INTEGRAL Likelihood = -1.44 Transmembrane 


141 


- 157 




158) 



• Final Results 

bacterial tt 
bacterial outside - 
bacterial cytoplasm - 



• Certainty=0 .4248 (Affirmative) • 
- Certainty=0. 0000 (Not Clear) < i 
■ Certainty=0. 0000 (Not Clear) < i 



The protein has homology with the following sequences in the GENPEPT database. 



Query: 31 MNTNTIKKWATGIGAALFIIIGMLWIPTPIPNTNIQLQYAVLALFAVIYGPGVGFFTG 90 

M N++K WATGIGAALF+IIG L+NIPTPIPNT+IQLQYAVLALF+ ++GP GF G 
Sbjct: 1 MramSWIWATGIGAALFVTIGWLINIPTPIPlSrrSIQLQYAVLALFSALFGPLAGFLIG 60 

Query: 91 FIGHALKDSIQYGSPWWTWVLVSGLLGLMIGFFAKKLAIQLSGMTKKDLLLFNVVQVIAN 150 

FIGHALKDS YG+PWWTWVL SGL+GL +GF K+ ++ K+++ FN+VQ +AN 

Sbjct: 61 FIGHALKDS FL YGAPWWn^LGSGLMGLFIG FGVKRESLTCGI FGNKEJI IRFNIVQFLAN 120 

Query: 151 LIGWSWAPYGDIFFYSEPASKVFAQGFLSSLVNSITIGVGGTLLLLAYAKSRPQKGSLS 210 

++ W ++AP GDI YSEPA+KVF QG ++ LVN++TI V GTLLL YA +R + G+L 
Sbjct: 121 VWWGLIAPIGDILVYSEPANKVFTQGWAGLVNALTIAVAGTLLLKLYAATRTKSGTLD 180 



Query: 211 KD 212 
K+ 

Sbjct: 181 KE 182 



No corresponding DNA sequence was identified in S.pyogenes. 

A related GBS gene <SEQ ID 8857> and protein <SEQ ID 8858> were also identified. Analysis of this 

protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 6 
McG: Discrim Score: -5.01 
GvH: Signal Score (-7.5): -5.9 

Possible site: 50 
»> Seems to have no N-terminal signal sequence 
ALOM program count: 5 value: -8.12 threshold: 0.0 

INTEGRAL Likelihood = -8.12 Transmembrane 31 - 47 

INTEGRAL Likelihood = -3.98 

INTEGRAL Likelihood = -3.66 

INTEGRAL Likelihood = -2.34 

INTEGRAL Likelihood = -1.44 



5.78 



Transmembrane 54 - 80 

Transmembrane 100 - 116 

Transmembrane 174 - 190 

Transmembrane 133 - 149 



2.11 
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*** Reasoning Step: 3 

Final Results 

5 bacterial membrane -— Certainty=0 .4248 (Affirmative) < suco 

• bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

"bacterial cytoplasm --- Certainty=0 . 0 0 0 0 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

10 ORF02330(367 - 912 of 1212) 

GP|6165407|emb|CAB59830.l| |AJ012388(1 - 182 of 182) hypothetical protein {Lactococcus 

lactis} 

%Match =28.1 

%Identity '- 59.9 %Similarity = 78.6 
15 Matches = 109 Mismatches = 39 Conservative Sub.s = 34 

102 132 162 192 222 252 282 312 

MQWGVGFIVGVIQDSCETALNSSTDVLFTAVAEKSVFGKK*TNEGLRYSI*DLFWYLILFSIVFQFFLSIRFQISLKYD 

20 342 372 402 432 462 492 522 552 

KIEQIVSDCLSLFFREVF^WINTIKKWATGIGAALFIIIGMLvNIPTPIPNT^3IQLQYAVLALFAVIYGPGVGFFTGFI 

i ho innninnni MimmMimimih ■■■■\\ in in 

MKNNSVKI WATGIGAALFVI IGWLINIPTPI PNTSIQLQYAVLALFSALFGPLAGFLIGFI 
10 20 30 40 50 60 

25 

582 612 642 672 702 732 762 792 

GHALKDSIQYGSPIWTWVLVSGLLGLMIGFFAKKIAIQLSGMTKKDLLLFNWQVIANLIGWSWAPYGDIFFYSEPASK 

IMIIM I: MM llhll HI I = = = ]= = = 11 = 1) =11- I -II 111= 11111=1 

GHAT,KDSFLYGAPWWT^GSGLMGLFLGFGVKRESLTCX3IFGNKEIIRraiVQFIAimfWGLIAPIGDILVYSEPANK 
30 80 90 100 110 120 130 140 

822 852 882 912 942 972 1002 1032 

VFAQGFLSSLVNSITIGVGGTLLLLAYAKSRPQKGSLSKD*DKRVIYERFY*MEGFYLSI*RSI*TNFKRD*LKHS*R*K 

ii ii == iii = = ii i inn n =i = ni i = 

35 VFTQGWAGLWALTIAVAGTLLLIOjYAATRTKSGTLDKE 
160 170 180 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

40 Example 1579 

A DNA sequence (GBSxl673) was identified in S.agalactiae <SEQ ID 4877> which encodes the amino 
acid sequence <SEQ ID 4878>. Analysis of this protein sequence reveals the following: 

Possible site: 42 

»> Seems to have no N-terrainal signal sequence 
45 INTEGRAL Likelihood = -6.85 Transmembrane 86 - 102 ( 80 - 106) 

Final Results 

bacterial membrane Certainty=0. 3739 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

50 bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
55 vaccines or diagnostics. 
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Example 1580 

A DNA sequence (GBSxl674) was identified in S.agalactiae <SEQ ID 4879> which encodes the amino 
acid sequence <SEQ ID 4880>. Analysis of this protein sequence reveals the following: 



Possible site: 47 

>>> Seems to have a cleavable N-tei 
INTEGRAL Likelihood - -3.61 
INTEGRAL Likelihood = -1.86 
INTEGRAL Likelihood = -1.38 
INTEGRAL Likelihood = -1.12 



• Final Results 

bacterial membrane - 
bacterial outside - 
bacterial cytoplasm - 



ti signal seq. 

Transmembrane 107 - 123 ( 96 - 124) 

Transmembrane 124 - 140 ( 124 - 142) 

Transmembrane 83 - 99 ( , 83 - 100) 

.1 - 158 ( 142 - 160) 



■- Certainty=0. 2444 (Affirmative) . 
■- Certainty=0. 0000 (Not Clear) < i 
•- Certainty=0. 0000 (Not Clear) < , 



A related GBS nucleic acid sequence <SEQ ID 9415> which encodes amino acid sequence <SEQ ID 9416> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC76124 GB:AE000391 putative transport protein [Escherichia 
coli K12] 

Identities = 139/178 (78%) , Positives = 159/178 (89%) 

Query: 1 ^GTMLFVALWNPIIAFVMMRKNPYPLVLRCLK^ 60 

+VG ML VALWNP++ + +R+NP+PLVL CL++SG+ AFFTRSSAANI PVNM LCE L 
Sbjct: 222 LVGCMLLVALVWPLLVWWKIRRNPFPLVLLCLRESGVYAFFTRSSAANIPVNMALCE2CL 281 

Query: 61 GLDKDTYSVSIPLGAAINMAGAAITINILTIiAAVNTLGITVDFPTAFLLSWAAVSACGA 120 

LD+DTYSVSIPLGA INMAGAAITI +LTLAAVNTLGI VD PTA LLSWA++ ACGA 
Sbjct: 282 NLDRDTYSVSIPLGATIlSIMAGAAITITVLTIAAvNTLGIPTOLPTALLLSWASLCACGA 341 

. Query: 121 SGVTGGSLLLIPVACSLFGISNDVAMQWGVGFIVGVIQDSCETALNSSTDVLFTAVA 178 
SGV GGSLLLIP+AC++FGISND+AMQW VGFI+GV+QDSCETALNSSTDVLFTA A 
Sbjct: 342 SGVAGGSLLLIPLACNMFGISNDIAMQWAVGFIIGVLQDSCETALNSSTDVLFTAAA 3 99 



A related DNA sequence was identified in S.pyogems <SEQ ID 488 1> which encodes the amino acid 
sequence <SEQ ID 4882>. Analysis of this protein sequence reveals the following: 



INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 



: site: 58 

have an uncleavable N- 
Likelihood =-13, 
Likelihood = -7. 
Likelihood = -6. 
Likelihood = -6. 
Likelihood = -5 
Likelihood = -4 
Likelihood = -3 
Likelihood = -3 
Likelihood = -0 



term signal seq 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 



358 - 



228 ( 202 - 

94 ( 74 • 

195 ( 175 • 

331 ( 312 • 

60 ( 42 ■ 

29 ( 11 ■ 

356 ( 333 ■ 

161 ( 144 ■ 

374 ( 358 ■ 



376! 



- Final Results 

bacterial membrane Certainty=0. 6477 (Affirmative) < suco 

- Certainty=0.0000 (Not Clear) < suco 

- Certainty=0. 0000 (Not Clear) < suco 



bacterial outside - 
bacterial cytoplasm - 



The protein has homology with the following sequences in the databases: 

>GP:AAFS5950 GB:AE004347 sodium/dicarboxylate symporter [Vibrio cholerae] 
Identities = 243/385 (63%), Positives = 299/385 (77%), Gaps - 2/385 (0%) 



Query: 9 VRVSLIKKIGIGWIGVMLGILAPDLTG-FSILGKLFVGGLKAIAPLLVFALVSQAISHQ 67 
VR +L+ +1 G++4G 4- +P+ ++G LFVG LKA+AP+LVF LV+ +I++Q 
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Sbjct: 11 VRGNLVLQILAGILLGAAMATFSPEYAQIC^GLIGNLFVGALKAVAPVLVFILVASSIANQ 70 

Query: 68 KKBRQTNMTLIIVLYLFGTFASALVA^TjTAYLFPLTLVLNTPVmELSPPQGVAEVFQSL 127 

KK + T M I+VLYLFGTF++AL AV+ ++LFP TLVL T +PPQG+AEV +L 

Sbjct: 71 KKNQHTYMRPIVVLYLFGTFSAALTAVILSFLFPTTLVLATGAEGA-TPPQGIAEVLNTL 129 

Query: 128 LLKLVDWPINALATANYIGVLSWAIIFGLALKAASKETKHLIKTAAEVTSQIVWIIN^ 187 

L KLVDNP++AL ANYIG+L+W 4- GLAL +S TK + + + SQIV +11 LA 
Sbjct: 130 LFKLVDNPVSALMNANYIGIIAWGVGLGLALHHSSSTTKAVPEDLSHGISQIVRFIIRLA 189 

Query: 188 PIGIMSLVFTTISENGVGILSDYAFLILVLVGTMLFVALWNPLIAVLITRQNPYPLVLR 247 

P GI LV +T + G L+ YA L+ VL+G M F+ALWNP+I ' R+NP+PLVL+ 
Sbjct: 190 PFGIFGIjVASTFATTGFDAIiAGYAQLLAVLLGAMAFIALVVNPMIVYYKIRRNPFPLVLQ 249 

Query: 248 CLRESGLTAFFTRSSAANIPVNMQLCQKIGLSKDTYSVSIPLGATINMGGAAITINVLTL 307 

CLRESG+TAFFTRSSAANIPVNM LC+K+ L +DTYSVSIPLGATINM GAAITI VLTh 
Sbjct: 250 CLRESGVTAFFTRSSAANIPTOaNlALCEKLtCLDEDTYSVSIPLGATINMAGAAITITVLTL 309 

Query: 308 AAVHTFGIPIDFLTALLLSWAAVSACGASGVAGGSLLLIPVACSLFGISNDLAMQWGV 367 

AAVHT GI +D +TALLLSWAAVSACGASGVAGGSLLLIP+AC LFGISND+AMQW V 
Sbjct: 310 AAVHTMGIEVDLMTALLLSWAAVSACGASGVAGGSLLLIPtACGLFGISNDIAMQWAV 369 

Query: 3 68 GFIVGVIQDSCETALNSSTDVLFTA 392 

GFI4GVIQDS ETALNSSTDVLFTA 
Sbjct: 370 GFIIGVIQDSAETALNSSTDVLFTA 394 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 153/186 (82%) , Positives = 172/186 (92%) 

Query: 1 MVGTMLFVALWNPI lAFVMMRKNPYPLVLRCIiKDSGITAFFTRSSAANI PVNMRLCEDL 60 

+VGTMLFVALWNP+IA ++ R+NPYPLVLRCL++SG+TAFFTRSSAANIPVNM+LC+ + 
Sbjct: 217 LVGT^FVALVVNPLIAVLITRQNPYPLVLRCLRESGLTAFFTRSSAANIPVNMQLCQKI 276 

Query: 61 GLDKHTYSVSIPLGiAAINMAGAAITINIM^ 120 

GL KDTYSVSIPLGA INM GAAITIN+LTIAAV+T GI +DF TA LLSWAAVSACGA 
Sbjct: 277 GLSKDTYSVSIPLGATINMGGAAITINVLTIAAVHTFGIPIDFLTALLLSWAAVSACGA 336 

Query: 121 SGVTGGSLLLIPVACSLFGISNDVAIIQWGVGFIVGVIQDSCETAIiNSSTDVLFTAVAEK 180 

SGV GGSLLLIPVACSLFGISND+AI1QWGVGFIVGVIQDSCETALNSSTDVLFTA+AE 
Sbjct: 337 SGVAGGSLLLIPVACSLFGISNDLAMQWGVGFIVGVIQDSCETALNSSTDVLFTAIAEN 396 

Query: 181 SVFGKK 186 

+ + +K 
Sbjct: 397 AFWKRK 402 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1581 

A DNA sequence (GBSxl675) was identified in S.agalactiae <SEQ ID 4883> which encodes the amino 
acid sequence <SEQ ID 4884>. This protein is predicted to be acid phosphatase. Analysis of this protein 
sequence reveals the following: 

Possible site: 40 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 2436 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 
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A related GBS nucleic acid sequence <SEQ ID 9427> which encodes amino acid sequence <SEQ ID 9428> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 



Query: 7 EQKTKFKHISLSSNKLLAKENTMSXrLWYQNSJ^l^yLQGYWAKMKLDDWLQKPSEKP 66 

++ K ++ S +L + ENTMSVLWYQ 4AEAKALYLQGY +A +L + L + ++KP 
Sbjct: 34 KETWQTKVTYSDEQLRSNENTMSVLWYQRAAEAKALYLQGYQIATDRLKNQLGQATDKP 93 

Query: 67 YSIILDLDETVLDNSPYQAKNIKDGSSFTPESWDKWVQKKSAKAVAGAKEFLKYANEKGI 126 

YSI+LD+DETVLDNSPYQAKNI 4-G+SFTPESWD WVQKK AK VAGAKEFL++A++ G+ 
Sbjct: 94 YS^DIDETVLDNSPYQAKNILEGTSFTPESWDVWVQKKEAKPVAGAKEFLQFADQNGV 153 

Query: 127 KIYYVSDRTDAQVDATKENLEK3GIPVQGKDHLLFLKKGMKSKEERRQAVQKDTNL1MLF 186 

+IYY+SDR +QVDAT ENL+KEGIPVQG+DHLLFL++G+KSKE+RRQ V++ TNLIMLF 
Sbjct: 154 QIYYISDRAVSQVDATMENLQKEGIPVQGRDHLLFLEEGVKSKEARRQKVKETTNLIMLF 213 

Query: 187 GDNLVDFADFSKSSSTDREQLLTKLQSEFGSKFIVFPNPMYGSWESAIYQGKHLDVQKQL 246 

GDNLVDFADFSK S DR LL++LQ EFG +FI+FPNPMYGSWESA+Y+G LD QL 
Sbjct: 214 GDNLVDFADFSKKSEEDRTALLSELQEEFGRQFIIFPNPMYGSWESAVYKGDKLDASHQL 273 

Query: 247 KERQKMLHSYD 257 

KER+K L S++ 
Sbjct: 274 KERRKALESFE 284 

A related DNA sequence was identified in S.pyogenes <SEQ ID 4885> which encodes the amino acid 
sequence <SEQ ID 4886>. Analysis of this protein sequence reveals the following: 

Possible site: 25 



> May be a lipoprotein 



Final Results 

bacterial membrane Certalnty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:CAA73175 GB:Y12602 acid phosphatase [Streptococcus equisimilis] 
Identities = 234/284 (82%) , Positives = 261/284 (91%) 

MKSKKWSVISLTLSLFLVTGCAKAnDI^KSVNDKPATKQTYNSYSDDQLRSRENTMSVLW 60 
MK+K+V SVISL LSLFLVTGCA++D+ +VN K KQT +YSD+QLRS ENTMSVLW 
MKTKQVASVISLALSLFLVTGCAQLDHKAKrVNSKETWQTKVTYSDEQLRSNENTMSVLW 60 



YQRAAE +ALYLQGYQLATDRLK QL + TDKPYSIVLDIDETVLDNSPYQAKN+LEGT 



FTPESWD WVQKKEAKPVAGAK+FLQFADQNGVQIYYISDR+ +QVDATMENLQKEGIPV 



QGRDHLLFLE+GVKSKE+RRQKVKETTN+ MLFGDNL+DFADFSKKS+EDRTALLS+LQE 



EFGR+FI IFPNPMYGSWE A+YKG+KLD QL+ERRK+L+SF+ 



Query: 


1 


Sbjct: 


1 




61 


Sbjct: 


61 


Query: 


121 


Sbjct: 


121 


Query: 


181 


Sbjct: 


181 


Query: 


241 


Sbjct: 


241 



An alignment of the GAS and GBS proteins is shown below. 
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Identities = 165/247 (67%) , Positives = 207/247 (83%) 



Query: 




TKFiarcSLSSNKLIAKElTOlSVLU^^ 


69 






TK S S ++L ++ENTMSVLWYQ +AE +ALYLQGY +A +L + I, KP++KPYSI 




Sb j ct : 


37 


TKQTYNSYSDDQLRSRElTTMSVLVjyQRAAETQALYLQGYQLATDRLKEQIiNKPTDKPYSI 


96 


Query: 


70 


ILDLDETVLDNSPYQAKNIKDGSSFTPESWDKWVQKKSAKAVAGAKEFLKYANEKGIKIY 


129 






+LD+DETVLDNSPYQAKN+ +G+ FTPESWD WVQKK AK VAGAK+FL++A++ G++IY 




Sb j ct : 


97 


VLDIDETVLDNSPYQAKKVLEGTGFTPESI*?DYWVQKI<EAKPVAGAI'CDFLQFADQNGVQIY 


156 


Query- 


130 


YVSDRTDAQVDATKEHLEKEGIPVQGKDHLLFLKKGMKSKESRRQAVQKDTMLIMLFGDN 


189 






Y+SDR4- QVDAT ENL+KEGIPVQG+DHLLFL+KG+KSKESRRQ V++ TN+ MLFGDN 




Sbjct: 


157 


YISDRSTTQVDATMENLQKEGIPVQGRDHLLFLEKGVKSKESRRQKVKETTNVTMLFGDN 


216 


Query: 


190 


LVDFADFSKSSSTDREQLLTKLQSEFGSKFIVFPNPMYGSWESAIYQGKHLDVQKQLKER 


249 






L+DFADFSK S DR LL+ LQ EFG +FI+FPKPMYGSWE AIY+G+ LDV KQL+ER 




Sbjct: 


217 


LLDFADFSKKSQEDRTALLSDLQEEFGRRFIIFPNPMYGSWEGA1YKGEKLDVLKQLEER 


276 




250 


QKMLHSY 256 








+K L S+ 




Sbjct: 


277 


RKSLKSF 283 





SEQ ID 9428 (GBS661) was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 136 (lane 2 & 4; MW 61kDa + lane 3; MW 27kDa) and in Figure 186 (lane 11; 
25 MW 61kDa). It was also expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 136 (lane 5-7; MW 25kDa). 

GBS661-GST was purified as shown in Figure 237, lane 5. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

30 Example 1582 

A DNA sequence (GBSxl676) was identified in S.agalactiae <SEQ ID 4887> which encodes the amino 
acid sequence <SEQ ID 4888>. This protein is predicted to be unnamed protein product. Analysis of this 
protein sequence reveals the following: 

Possible site: 58 
35 >>> Seems to have no H-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 3462 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

40 bacterial outside Certainty=0. 0000 (Not Clear) < suco 

A related DNA sequence was identified in S.pyogems <SEQ ID 4889> which encodes the amino acid 
sequence <SEQ ID 4890>. Analysis of this protein sequence reveals the following: 

Possible site: 58 
45 >» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3462 (Affirmative) < suco 

bacterial membrane Certainty=0 .0000 (Not Clear) < suco 

50 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 395/398 (99%) , Positives = 398/398 (99%) 
55 Query: 1 MAKLTVKDVDLKGKKVLTOVDFNVPLKDGVITNT)NRITAALPTIICYIIEQGGRAILFSHL 60 
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MAKLTVKDVDLKGKIOTOTVD 
Sbjct: 1 MAKLTVKDVDLKGKKVLWVDFNVPLK^ 50 

Query: SI GRVKEEMJKEGKSIAPVAflDLAAKLGQDWFPGVTRGAKLEI^INALEDGQ^LVENTRF 120 

GRVKEEADKEGKSIAPVAADLAAKLGQDWFPGVTRG+KLEEAIWALEDGQVLLVENTRF 
Sbjct: 61 GRVKEEMJKEGKSKAPVAADLAAKXGQDWFPGVTRC-SICLEEAINALEDGQVLLVENTRF 120 

Query: 121 EDVDGKKESKOTDEELGKYWASLGDGIFVNDAFGTAHRAHASNVGISANVEKAVAGFLLEN 180 

EDVDGKKESKNDEELGKYWASLGDGIFVNDAFGTAHRAHASNVGISANVEKAVAGFLLEN 
Sbjct: 121 EDVDGKKESKMDEELGICYWASLGDGIFVMDAFGTAHRAHASNVGISANTOKAVAGFLLEM 180 

Query: 181 EIAYIQEAVETPERPFVA1LGGSIWSDKIGVIENLLEKADKVLIGGGMTYTFYKAQGIEI 240 

EIAYIQEAVETPERPFVAILGGSKVSDKIGVIENLLEKADKULIGGGMTYTFYKAQGIEI 
Sbjct: 181 EIAYIQEAVETPERPFVAILGGSIWSDKIGVIENLLEKADKVLIGGGMTYTFYKAQGIEI 240 

Query: 241 GNSLVEEDKLDVAKDLLEKSNGKLILPVDSKEANAFAGYTEVRDTEGEAVSEGFLGLDIG 300 

GNSLVEEDKLDVAKDLLEKSNGKLILPVDSKEANAFAGYTEVRDTEGEAVSEGFLGLDIG 
Sbjct: 241 GMSLVEEDKLDVAKDLLEKSNGKLILPVDSI<SAHAFAGYTEVRDTEGEAVSEGFLGLDIG 3 00 



L PKSIAKFDKALTGAKTVVWNGPMGVFENPDFQAGTIGVMDAIVKQPGVKSIIGGGDSAAA 3 50 

PKSIA+FD+ALTGAKTWWNGPMGVFENPDFQAGTIGVMDAIVKQPGVKSIIGGGDSAAA 
L PKSIAEFDQALTGAKTWWNGPMGVFENPDFQAGTIGVMDAIVKQPGVKSIIGGGDSAAA 3 50 



Sbjci 

Query: 361 AINLGRADKFSWISTGGGASMELLEGKVLPGIAALTEK 398 

AINLGRADKFSWISTGGGASMELLEGKVLPGLAALTEK 
Sbjct: 361 AINLGRADKFSWISTGGGASMELLEGKVLPGLAALTEK 398 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 



30 Example 1583 

A DNA sequence (GBSxl677) was identified in S.agalactiae <SEQ ID 4891> which 
acid sequence <SEQ ID 4892>. Analysis of this protein sequence reveals the following: 



encodes the amino 



Possible sit 


e: 53 












»> Seems to 


have no N- terminal s 


Lgnal sequence 










INTEGRAL 


Likelihood = -8.39 


Transmembrane 


97 


113 


93 


118) 


INTEGRAL 


Likelihood = -3.66 


Transmembrane 


25 




24 


48) 


INTEGRAL 


Likelihood = -3.40 


Transmembrane 


121 


137 


121 


140) 


INTEGRAL 


Likelihood = -3.24 


Transmembrane 


72 




72 


88) 


INTEGRAL 


Likelihood = -2.07 


Transmembrane 


143 


159 


143 


150) 



Final Results 

bacterial membrane --- Certainty=0 .4354 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
A related DNA sequence was identified in S.pyogenes <SEQ ID 4893> which 
sequence <SEQ ID 4894>. Analysis of this protein sequence reveals the following: 



encodes the amino acid 



Possible site: 53 
•> Seems to have no N-terminal signal sequence 
INTEGRAL Likelihood = 
INTEGRAL Likelihood = 
INTEGRAL Likelihood = 
INTEGRAL Likelihood = 
INTEGRAL Likelihood = 



Transmembrane 97 - 

Transmembrane 121 - 

Transmembrane 25 - 

Transmembrane 72 - 

Transmembrane 154 - 



■ Final Results 

bacterial membrane Certainty=0 .4291 (Affirmative) < suco 

bacterial outside --- Certainty=0. 0000 (Not Clear) < suco , 

bacterial cytoplasm --- Certainty=0. 0000 (Not Clear) < suco 
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The protein has no significant homology with any sequences in the GENPEPT database. 
An alignment of the GAS and GBS proteins is shown below. 

Identities = 155/178 (87%) , Positives = 169/178 (94%) 

Query: 1 MKTLKKLLSNYKFDIKKPKLGMRTFKTGLSVFLVLLVFHLFGWKGLQIGALTAVFSLRED 60 

MKTL+KLLSNYKFDIKKFKLGMRT KTGLSVFLVLLVFHLFGWKGLQIGALTAVFSLRED 
Sbjct: 1 MKTLRKLLSNYKFDIKKFKLGMRTLKTGLSVFLVLLVFHLFGWKGLQIGALTAVFSLRED 60 

Query: 61 FDKSVHFGFSRIIGNSIGGLLSLVFFAFNEIFHQAFWVTLLIVPICTMLCIMIHVACNNK 120 

FDKSVHFGFSRIIGNSIGGLLSLVFFAFNEIFHQAEWTLLIVPICTMLCIM+HVACNNK 
Sbjct: 61 FDKSVHFGFSRIIGNSIGGDLSLVFFAF^IE:FHQAFWVTLLIVPICTMLCI^WNVACNNK 120 

Query: 121 EGIIGGTAALLI1TLSIPSGETILYVFARIFETFCGVFIAMMVNTDIEILRKKLKNNK 178 

SGIIG AALLIITLSIP+G+T +YV +R+FETFCGVF+A++VNTD+E+++ K N K 
Sbjct: 121 SGIIGAVAALLIITLSIPTGCTFIYVTSRVFETFCGVFVAIL1/NTDVELIKNKWFNKK 178 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1584 

A DNA sequence (GBSxl678) was identified in S.agalactiae <SEQ ID 4895> which encodes the amino 
acid sequence <SEQ ID 4896>. This protein is predicted to be regulatory protein glnr (glnR). Analysis of 
this protein sequence reveals the following: 

a uncleavable N-term signal seq 

Final Results 

bacterial membrane — - Certainty=0. 0000 (Not Clear) < suco 

bacterial outside r -- Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAA00402 GB.-D00513 ORF129 [Bacillus cereus] 
Identities = 59/123 (47%) , Positives = 89/123 (71%) , Gaps = 5/123 (4%) 

Query: 4 RELRRTMAVFPIGAVMKLTDLTARQIRYY3DQGLITPERTEGNRRMFSLNDMDRLLEIKD 63 

+E RR+ +FPIG VM LT L+ARQIRYYE+ L++P RT+GNRR+FS ND+D+LLEIKD 
Sbjct: 2 KEDRRSAPLFPIGIVMDLTQLSARQXRYYSEHNLVSPTRTKGNRRLFSFNDVDKLLEIKD 61 

Query: 64 FISDGLHISDIKNEYMQRQH KSKEKQKSLSDAEVRRLLQDELRNQGRFSSPSQHI 118 

+ GL+++ IK + +++ K KE+ K +S E+R++L+DEL++ GRF+ S 

Sbjct: 62 LLDQGLNMAGIKQVLLMKENQTEAvTCVTQETKEISKTELRKILRDELQHTGRFNRTSLRQ 121 

Query: 119 GNM 121 
G++ 

Sbjct: 122 GDI 124 

A related DNA sequence was identified in S.pyogenes <SEQ ID 4897> which encodes the amino acid 
sequence <SEQ ID 4898>. Analysis of this protein sequence reveals the following: 

n signal seq 

- Final Results 

bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0 . 0000 (Not Clear) < suco 
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The protein has homology with the following sequences in the databases: 



Query: 4 KELRRSMAVFP IGTVMTLTDLSAHQIRYyEDQGLI KPERTQGNRRMFSLNDMDRLLE I KD S3 

KE RRS +FPIG VM LT LSARQIRYYE+ L+ P RT+GNRR+FS ND+D+LLEIKD 
Sbjct: 2 KEDRRSAPLFPIGIVMDLTQLSARQIRYYEEHNLVSPTRTKGNRRLFSFNDVDKLLEIKD SI 

Query: 64 FLSEGLNIAAI KREYVERQG KLMQKQKALTDADVRRILHDEMLTQSGFSTPSQHI 118 

L +GLN+A IK+ + ++ K+ ++ K ++ ++R+IL DE+ F4 S 

Sbjct: 62 LLDQGLmiAGIKQVLLMKENQTEAVKVKEETKEISKTELRKILRDELQHTGRFNRTSLRQ 121 

Query: 119 GN 120 
G+ 

Sbjct: 122 GD 123 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 90/123 (73%) , Positives = 108/123 (87%) 

Query: 1 MKERELRRTMAVFPIGAVMKLTDLTARQIRYYEDQGLITPERTEGNRRMFSLNDMDRLLE SO 

MKE+ELRR+MAVFPIG VM LTDL+ARQIRYYEDQGLI PERT+GNRRMFSLNDMDRLLE 
Sbjct: 1 MKEKELRRSMAVFPIGTVMTLTDLSARQIRyYEDQGLIKPERTQGNRRMFSLNDMDRLLE 60 

Query: 61 IKDF1SDGLHISDIKNEYMQRQHKSKEKQKSLSDAEVRRLLQDELRNQGRFSSPSQHIGW 120 

IKDF+S+GL+I+ IK EY++RQ K +KQK+L+DA+VRR+L DE+ Q FS+PSQHIGN 
Sbjct: 61 IKDFLSEGLNIAAIKREYVERQGKLMQKQKALTDADVRRILHDEMLTQSGFSTPSQHIGN 120 

Query: 121 MHL 123 

Sbjct: 121 FRI 123 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 



Example 1585 

A DNA sequence (GBSxl679) was identified in S.agalactiae <SEQ ID 4899> which encodes the amino 
acid sequence <SEQ ID 4900>. This protein is predicted to be glutamine synthetase (glnA). Analysis of this 
protein sequence reveals the following: 
Possible site: 29 

»> Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0 . 2157 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related DNA sequence was identified in S.pyogenes <SEQ ID 490 1> which encodes the amino acid 
sequence <SEQ ID 4902>. Analysis of this protein sequence reveals the following: 

Possible site: 29 
>» Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -0.00 Transmembrane 347 - 363 ( 347 - 363) 



- Final Results 

bacterial membrane — Certainty=0. 1001 (Affirmative) • 

bacterial outside Certainty=0 . 0000 (Not Clear) < i 

bacterial cytoplasm — Certainty=0 . 0000 (Not Clear) < i 



An alignment of the GAS and GBS proteins is shown below. 
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Identities = 392/448 (87%) , Positives = 421/448 (93%) 

Query: 1 MTITAEDIRREVKEKNVTFLRLMFTDILGVMKNVEIPATDEQLDKVLSNKAMFDGSSIEG 60 

M IT DIRREVKEKNVTFLRLMFTDI+GVMKNVEIPAT EQLDKVLSNK MFDGSSIEG 
Sbjct: 1 I^ITVADIRREVKEKNOTFLRLMFTDIMGVTvlIOT'EIPATKEQLDKVLSNKVMFDGSSIEG 60 

Query: 61 FTOINESDMYLYPDLDTWIVFPWGDENGAVAGL1CD1YTAEGEPFAGDPRGNLKRNMKRM 120 

FVRINESDMYLYPDLDTWIVFPWGDENGAVAGLICDIYTAEG+PFAGDPRGNLKR +K M 
Sbjct: 51 FVRINESDMYLYPDLDTWIVFPWGDENGRVAGLICDIYTAEGKPFAGDPRGNLKRALKHM 120 

Query: 121 QEMGYKSFIttiGPEPEFFLFKMDENGNFTLDVM)KGGYFDI^TDIADNTRREIVNVLTQM 180 

E+GYKSFNLGPEPEFFLFKMD+ GNPTL+VND GGYFDLAP DLADNTRREIVN4LT+M 
Sbjct: 121 ITOIGYKSFmjGPEPEFFLFKMDDKGNPTlE^TOHGGYFDIAPIDIADlSrrRREIVHILTKM 180 

Query: 181 GFEvFASHHEVAVGQHEIDFKYDDVLKACDNIQLFKLWKriARKHGLYATFMAKPKFGI 240 

GFEVEASHHEVAVGQHEIDFKY DVLKACDN I Q+ FKLWKTI AR+HGLYATFMAKPKFGI 
Sbjct: 181 GFEVEASHHEVAVGQHEIDFKYADVLKACDNIQIFKLWKTIAREHGLYATFMAKPKFGI 240 



Sbjct 
Query: 
Sbjct 

Sbjct 

Sbjct 



301 KRLVPGYFAPVYVAWAGRNRSPLIRVPASRGMGTRLELRSVDPTANPYLALSVLLGSGLE 350 

KRLVPGYEAPVYVAWAG NRSPLIRVPASRGMGTRLELRSVDPTANPYLAL+VLL +GL+ 
301 KRLVPGYEAP^mTAMAGSlffiSPLIRVPASRGMGTRLELRSVDPTANPYLAIAVLLEAGLD 360 

361 GIENKIEAPEPIETNIYAMTVEERRQAGIVDLPSTLHNALEALEEDEVVKAALGTHIYTN 420 

GI NKIEAPEP+E NIY MT+EER +AGI+DLPSTLHNAL+AL++D+W+ ALG HIYTN 
361 GIINKIEAPEPVEANIYTMTMEERNEAGIIDLPSTLHNALKALQKDDWQKALGYHIYTN 420 

421 FLDAKRIEWASYATYVSQWEIDNYLDLY 448 

FL+AKRIEW+SYAT+VSQWEID+Y+ Y 
421 FLEAKRIEWSSYATFVSQWEIDHYIHNY 448 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1586 

A DNA sequence (GBSxl680) was identified in S.agalactiae <SEQ ID 4903> which encodes the amino 
acid sequence <SEQ ID 4904>. This protein is predicted to be SceB precursor. Analysis of this protein 
sequence reveals the following: 



■ Final Results 

bacterial membrane Certainty=0 . 0000 (Not Clear) • 

bacterial outside Certainty=0 . 0000 (Not Clear) ■ 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) ■ 



The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAA66624 GB:X97985 ORF1 [Staphylococcus aureus] 
Identities = 44/119 (36%) , Positives = 66/119 (54%) , Gaps = 4/119 (3%) 

Query: 26 SFASTNADANTYt^AVDVDYLASAEEIAOAHPA-SOT 83 

S AS + +N + ++ 1+ + + SN + GQCT+ V + + G+ WG 

Sbjct: 117 SGASYSTTSNNVHVTTTAAPSSNGRSISNC-YASGSNLYT3GQCTYYVFDRVGGKIGSTWG 176 

Query: 84 NGGDVmASAASADYTVGTQPRVGSIVOmX3SYGHmYVTAvIDPVTNKIQVLESNYAGH 142 

N +WA +AAS+ YTV P+VG+I+ T G YGHVAYV V+ ++V E NY GH 

Sbjct: 177 NASNWAN7AAASSGYTvNNTPKVGAIMQTTQGYYGHVAYVEGVNS-NGSTOVSFJVINY-GH 233 
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A related DNA sequence was identified in S.pyogenes <SEQ ID 1013> which encodes the amino acid 
sequence <SEQ ID 1014>. Analysis of this protein sequence reveals the following: 

Possible site: 17 

»> Seems to have a cleavable N-term signal seq. 

5 

Final Results 

bacterial outside --- Certainty=0 .3000 (Affirmative) < auco 
bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 
bacterial cytoplasm --- Certainty=0 . 0000 (Not Clear) < suco 

10 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 60/115 (52%), Positives = 81/115 (70%), Gaps = 7/115 (6%) 

Query: 55 AHPASNTFPLGQCTWGWENIATSJAGNWWGNGGDV/AASAASADYTVGTQPRVGSIVCWTDG 114 
15 ++ +SNT4P+GQCTWG K +A WAGN WGNGG MA SA +A Y G+ P VG+I W DG 

Sbjct: 291 SYDSSNTYPVGQCTWGAKSIAPWAG>JNWGNGGQVIAYSAQAAGYRTGSTPMVGAIAVTOIDG 350 

Query: 115 SYGHVAYVTAVDPVTNKIQVLESNYAGHQWIDNYRGWFDPQNTVTPGVVSYIYPN 169 
YGHVA V V ++ I+V+ESNY+G Q+I ++RGWF+P V++IYP+ 
20 Sbjct: 351 GYGHVAWVEVQSASS - IRVMESNYSGRQYIADHRGWFNPTG VTFIYPH 398 

A related GBS gene <SEQ ID 8859> and protein <SEQ ID 8860> were also identified. Analysis of this 

protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: a 
25 McG: Discrim Score: 5.85 

GvH: Signal Score (-7.5): 3.11 

Possible site: 24 
>» Seems to have a cleavable N-term signal seq. 
AL0M program count: 0 value: 6.74 threshold: 0.0 
30 PERIPHERAL Likelihood = 6.74 115 

modified ALOM score: -1.85 

*** Reasoning Step: 3 

35 . Final Results 

bacterial outside -— Certainty=0. 3000 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0. 0000 (Not Clear) < suco 

40 The protein has homology with the following sequences in the databases'. 

37.5/56.7% over 200aa 

Staphylococcus aureus 

GP 1 1340128] 0RF1 Insert characterized 

45 

ORF00255(376 - 726 of 1107) 

GP|l340128|emb|CAA66624.l| |X97985(33 - 233 of 255) 0RF1 {Staphylococcus aureus} 
%Match =9.0 

%Identity =37.5 %Similarity =56.7 
50 Matches = 45 Mismatches = 47 Conservative Sub.s = 23 

294 324 354 384 414 

SVI WI * * TRSHQMEENMNI KQLKSKTMLGTVALVSAFS FASTNADANTYNYAVDVD 

I = : | :| : =| I =| 

5 5 MKIQOTATIATAGIATIAFAGHDAQAAEQNNNGYNSM>AO^ 

10 20 30 40 50 60 70 

462 489 516 546 576 606 

YLASAEEIAQAHPA- SNTFPLGQCTWGV- KEMATWAGNWWGNGGDWAASAASADYTVGTQ 

60 == I: = = II : MM: I = 1= III : I I =111= Ml 

GSGASYSTTSNNVHVTTTAAPSSNGRSISNGYASGSNLYTSGQCTYYVFDRVG 

130 140 150 160 170 180 190 
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PRVGSIVCWTDGSYGHVAYVTAVDPVTNKIQVLESNY^^^ 
1=11=1= i I lllllll h ==l I II II 

Pro7(aIMQTTQGYYGHVAyVEGVNS-NGSVRVSE^mY-GHGAC3VVTSRTISAKQAGSYNFIH 
210 220 230 240 250 

SEQ ID 8860 (GBS30) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 8 (lane 2; MW 19.2kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 16 (lane 2; MW 44.2kDa). 

GBS30-GST was purified as shown in Figure 193, lane 8. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1587 

A DNA sequence (GBSxl681) was identified in S.agalactiae <SEQ ID 4905> which encodes the amino 
acid sequence <SEQ ID 4906>. Analysis of this protein sequence reveals the following: 

Possible site: 14 

>>> Seems to have an uncleavable N-term signal seq 

Likelihood = -3.93 Transmembrane 2 - 18 ( 1-18) 



— — Final Results 

bacterial membrane Certainty=0. 2572 (Affirmative) < suco 

bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm — Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1588 

A DNA sequence (GBSxl682) was identified in S.agalactiae <SEQ ID 4907> which encodes the amino 
acid sequence <SEQ ID 4908>. Analysis of this protein sequence reveals the following: 

Possible site: 28 

>>> Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 2160 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAB06381 GB:AP001516 unknown conserved protein [Bacillus halodurans] 
Identities = 353/550 (64%) , Positives = 443/550 (80%) 

Query: 6 LKPEEVGVYAIGGLGEIGKNTYGIEYQDEIIIVDAGIKFPEDDLLGIDYVIPDYSYIVEN 65 

LK + VYA+GGLGEIGKNTY +++QDEII++DAGIKFPED+LLGIDYVIPDYSY+V+N 
Sbjct: 4 LKIWQTAVYALGGLGEIGKNTYAVQFQDEIILIDAGIKFPEDELLGIDYVIPDYSYLVKN 63 



Query: 66 IDRIKALVITHGHEDHIGGIPFLLKQANLPIYAGPLALALIKGKLEEHGLLRDATLYEIH 125 
50 ++IK L ITHGHEDHIGGIP+LL++ N+PIY G LAL L++GKLEEHGLLR A L++I 

Sbjct: 64 ENKIKGLFITHGHEDHIGGIPYLLREVNIPIYGGKLALGLLRGKLEEHGLLRKAKLHDIQ 123 
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Query: 126 ANTELTFKNLSVTFFRTTHSIPEPLGIVIHTPQGKVICTGDFKFDFTPVGEPADLHRMAA 135 

+ + F SV+FFRTTHSIP+ GIV+ TP G ++ TGDFKFDFTPVGEPA+L +MA 
Sbjct: 124 EDDI IKFAKTSVSFFRTTHS I PDSYGIWKTPPGNI VHTGDFKFDFTPVGEPANLTKMAK 183 

Query: 186 LGEDGVLCLLSDSTNAEVPTFTNSEKIVGQSIMKIIEGIEGRIIFASFASNIFRLQQAAE 245 

+GE+GVLCLLSDSTN+E+P FT SE+ VG+SI I +EGRIIFA+FASNI RLQQA E 
Sbjct: 184 IGEEGVLCLLSDSTNSE I PEFTMSERKVGES IDHI FRRVEGRI I FATFASNIHRLQQAVE 243 

Query: 246 AAVKTGRKIAVFGRSMEKAIWGIELGYIKVPKGTFIEPSELKNLHASEVLIMCTGSQGE 305 

+AV+ GRK+AVFGRSME AI G ELGYIK PK TFIEP++L L +EV+I+CTGSQGE 
Sbjct: 244 SAVRYGRKVAVFGRSMESAINIGQELGYIKAPKHTFIEPNQLNKLPDNEVMILCTGSQGE 303 

Query: 306 SMAALARIANGTHRQVTLQPGDTVIFSSSPIPGNTTSVNKLINTIQEAGVDVIHGKINNI 365 

MAAL+R+A GTHRQ+ + PGDTVIFSSSPIPGNT SV+K IN + +AG +VIHG +N+I 
Sbjct: 304 PMAAIjSRVAFGTHRQIQIIPGDT/IFSSSPIPGNTLSVSKTINQLYKAGANVIHGSLNDI 363 

Query: 366 HTSGHGGQQEQKIMLPJLIKPKYFMPVHGEYP^QKVHAGIAVDTGIPKENIFIMENGDVLA 425 

HTSGHGGQ+EQKLMLRLIKPKYFMP+HGEYRM K+H LA D G+P EN FIM+NGDVLA 
Sbjct: 364 HTSGHGGQEEQK1MLRLIKPKYFMPIHGEYRMLKMHTKLAEDCGVPAENCFIMDNGDVLA 423 

Query: 426 LTSDSARIAGHFNAQDIYVDGNGIGDIGAAVLRDRHDLSEDGVVLAVATVDFDSKMILAG 485 

L DA IAG + +YVDGNGIGDIG VLRDR LSE+G+V+ V +++ + AG 

Sbjct: 424 LHPDEAGIAGKIPSGSVYVDGNGIGDIGNIVLRDRRILSEEGLVWWSIiKMKEYKVTAG 483 

Query: 486 PDILSRGFIYMRESGDLIRESQHILFNAIRIALKNKDASIQSVNGAIVNALRPFLYEKTE 545 

PD++SRGF+YMRESGDLI+E+Q +L N ++ ++ K + I + L PFLY++T+ 

Sbjct: 484 PDLISRGFVYMRESGDLIQEAQRLLANHLQEVMERKTNQWSEIKNEITDVLGPFLYDRTK 543 

Query: 546 REPIIIPMVL 555 

R+P+I+P+++ 
Sbjct: 544 RKPMILPIIM 553 

A related DNA sequence was identified in S.pyogenes <SEQ ID 4909> which encodes the amino acid 
sequence <SEQ ID 4910>. Analysis of this protein sequence reveals the following: 

Possible site: 28 

.erminal signal sequence 

■ 484 ( 468 - 484) 

Final Results 

bacterial membrane Certainty=0. 1044 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:BAB06381 GB:AP001516 unknown conserved protein [Bacillus halodurans] 
Identities = 353/550 (64%) , Positives = 444/550 (80%) 

LKPNEVGVFAIGGLGEIGKNTYGIEYQDE1 1 IVDAGIKFPEDDLLGIDYVIPDYSYIVDN 65 
LK N+ V+A+GGLGEIGKNTY +++QDEII++DAGIKFPED+LLGIDYVIPDYSY+V N 
LKNNQTAVYALGGLGEIGKNTYAVQFQDEI ILIDAGIKFPEDELLGIDYVIPDYSYLVKN 63 





6 


Sbjct: 


4 




66 


Sbjct: 


64 




126 


Sbjct: 




Query: 


186 


Sbjct: 


184 



+GEEGVLCLI1SDSTN+E I P FT SE+ VG+SI I + GRI I FA+ FASNI +RLQQA E 
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Query: 246 AAVKTGRKIAVFGRSMEKAIWGIELGYIKVPKGTFIEPSELKNLHASEVLIMCTGSQGE 305 

+AV+ GRK+AVFGRSME AI G ELGYIK PK TFIEP++L L +EV+I+CTGSQGE 
Sbjct: 244 SAVRYGRKVAVFGRSMESAINIGQELGYIKAPKNTFIEPNQLNKLPDNEVMILCTGSQGE 303 

Query: 306 SMAALARI ANGTHRQVTLQPGDTVI FSSSPI PGNTTSVNKLINT IQEAGVDVI HGKVNNI 365 

MAAL+R+A GTHRQ+ + PGDTVI FSSSPI PGNT SV+K IN + +AG +VIHG +N+I 
Sbjct: 304 PMAALSRVAFGTHRQIQIIPGDTVIFSSSPIPGNTLSVSKTINQLYKAGANVIHGSLNDI 363 

Query: 366 HTSGHGGQQEQKLMLSLIKPKYFMPVHGEYRMQKVHAGLAMDIGIPKENIFIMENGDVIA 425 

HTSGHGGQ+EQKLML LIKPKYFMP+HGEYRM K+H IiA D G+P EN FIM+NGDVLA 
Sbjct: 364 HTSGHGGQEEQKXiMLRLIKPKYFMPIHGEYRMLKMHTKLAEDCGVPAENCFIMDNGDVLA 423 

Query: 426 LTSDSARIAGHFNAQDIYVDGNGIGDIGAAVLRDRRDLSEDGVVLAVATVDFNTQMILAG 485 

L DA IAG + +YVDGNGIGDIG VLRDRR LSE+G+V+ V +++ + AG 

Sbjct: 424 LHPDEAGIAGKIPSGSVYVDGNGIGDIGNIVLRDRRILSEEGLVVVWSLNMKEYKVTAG 483 

Query: 486 PDILSRGFIYMRESGDLIRESQRVLFNAIRIALKNKDASIQSVNGAIVNALRPFLYEKTE 545 

PD++SRGF+YMRESGDLH-E4QR+L N ++ ++ K + I + L PFLY++T+ 

Sbjct: 484 PDLISRGFVYMRESGDLIQEAQRLLANHLQEVMERKTNQWSEIKNEITDVLGPFLYDRTK 543 

Query: 546 REPIIIPMVL 555 

R+P+I+P+++ 
Sbjct: 544 RKPMILPIIM 553 

An alignment of the GAS and GBS proteins is shown below. 
Identities = 523/559 (93%) , Positives = 550/559 (97%) 

Query: 1 MSNINLKPEEVGVYAIGGLGEIGKNTYGIEYQDEIIIVDAGIKFPEDDLLGIDYVIPDYS 60 

M+NI+LKP EVGV+AIGGLGEIGKNTYGIEYQDEIIIVDAGIKFPEDDLLGIDYVIPDYS 
Sbjct: 1 MTNISLKPNEVGVFAIGGLGEIGKNTYGIEYQDEIIIVDAGIKFPEDDLLGIDYVIPDYS 60 

Query: 61 YIVENIDRIKALVITHGHEDHIGGIPFLLKQANLPIYAGPLALALIKGKLEEHGLLKDAT 120 

YIV+N+DR+KALVITHGHEDHIGGIPFLLKQAN+PIYAGPLALALI+GKLEEHGL R+AT 
Sbjct: 61 YIVDNLDRVKALVITHGHEDHIGGIPFLLKQANIPIYAGPLALALIRGKLEEHGLWREAT 120 

Query: 121 LYEIHANTELTFKNLSVTFFRTTHS I PEPLGIVIHTPQGKVI CTGDFKFDFTPVGEPADL 180 



Query: 181 HRMAALGEDGVLCLLSDSTNAEVPTFTNSEKIVGQSIMKIIEGIEGRIIFASFASNIFRL 240 

RNAALGE+GVLCLLSDSTNAE+PTFTNSEK+VGQSI+KIIEGI GRIIFASFASNI+RL 
Sbjct: 181 QRMAALGEEGVLCLLSDSTNAEIPTFTNSEKWGQSILKIIEGIHGRIIFASFASNIYRL 240 

Query: 241 QQAAEAAVKTGRKIAVFGRSMEKAIVNGIELGYIKVPKGTFIEPSELKNLHASEVLIMCT 300 

QQAAEAAVKTGRKIAVFGRSMEKAIWGIELGYIKVPKGTFIEPSEDKNLHASEVBIMCT 
Sbjct: 241 QQAAEAAVKTGRKIAVFGRSMEKAIVNGIELGYIKVPKGTFIEPSELKNLHASEVLIMCT 3 00 

Query: 301 GSQGESMAALARIANGTHRQWLQPGDTVIFSSSPIPGNTTSVNKLINTIQEAGTOVIHG 360 

GSQGESMAALARIANGTHRQVTLQPGDTVIFSSSPIPGNTTSVNKLINTIQEAGVDVIHG 
Sbjct: 301 GSQGESMAALARIAMGTHRQVTLQPGDTVI FSSSPI PGNTTSVNKLINTIQEAGVDVIHG 360 

Query: 361 KIMNIHTSGHGGQQEQKLMLRLIKPKYFMPVHGEYRMQKVHAGLAVDTGIPKENIFIMEN 420 

K+NNIHTSGHGGQQEQKLML LIKPKYFMPVHGEYRMQKVHAGLA+D GIPKENIFIMEN 
Sbjct: 361 KVNNIHTSGHGGQQEQKLMLSLIKPKyFMPVHGEYRMQICVHAGLAMDIGIPKENIFIMEN 420 

Query: 421 GDVLALTSDSARIAGHFNAQDIYVDGNGIGDIGAAVLRDRHDLSEDGWIjAVATVDFDSK 480 

GDVLALTSDSARIAGHFNAQDIYVDGNGIGDIGAAVLRDR DLSEDGWLAVATVDF+++ 
Sbjct: 421 GDVLALTSDSARIAGHFNAQDIYVDGNGIGDIGAAVLRDFJIDLSEDGVVIiAVATVDFNTQ 480 

Query: 481 MILAGPDII.SRGFIYMRESGDLIRESQHILFNAIRIALKNKDASIQSVNGAIVNALRPFL 540 

MIIiAGPDILSRGFIYMRESGDLIRESQ +LFNAIRIALKNKDASIQSVNGAIVNALRPFL 
Sbjct: 481 MILAGPDILSRGFIYMRESGDLIRESQRVLFNAIRIALKNKDASIQSVNGAIVNALRPFL 540 



Query: 541 YEKTEREPI I IPMVLTPDK 559 

YEKTEREPIIIPMVIiTPDK 
Sbjct: 541 YEKTEREPI IIPMVLTPDK 559 
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Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1589 

A DNA sequence (GBSxl683) was identified in S.agalactiae <SEQ ID 491 1> which encodes the amino 
5 acid sequence <SEQ ID 4912>. Analysis of this protein sequence reveals the following: 

Possible site: 32 

»> Seems to have no N-terminal signal sequence 

Final Results 

10 bacterial cytoplasm Certainty=0. 2932 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

15 >GP:CAB13327 GB:Z99111 ykzG [Bacillus subtilis] 

Identities = 27/75 (36%), Positives = 44/75 (58%), Gaps = 7/75 (9%) 

Query: 1 MIYKVFYQETKERNPRREQTKTLYVTIDRANBLEGRIAARKLVEENTAYNIEFIELLSDK 60 
MIYKVFYQE + P RE+T +LY+ + ++ ++ +K +NIEFI + 

20 Sbjct: 1 MIYKVFYQEKADEVPVREKTDSLYIEGVSERDVRTICLKEKK FNIEFITPVDGA 53 

Query: 61 HLEYEKETGVFELTE 75 

LEYE+++ F++ E 
Sbjct: 54 FLEYEQQSENFKVLE 68 

25 

A related DNA sequence was identified in S.pyogenes <SEQ ID 491 3> which encodes the amino acid 
sequence <SEQ ID 4914>. Analysis of this protein sequence reveals the following: 

Possible site: 32 

»> Seems to have no N-terminal signal sequence 

30 

Final Results 

bacterial cytoplasm Certainty=0. 3428 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

35 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 60/76 (78%), Positives = 70/76 (91%) 

Query: 1 MIYKVFYQETKERNPRREQTKTLYvTIDAANELEGRIAARKLVEENTAYNIEFIELLSDK 60 
40 MIYKVFYQETK+++PRRE TK LY+- IDA +EL+GRI AR+LVE+NT YN+EFIELLSDK 

Sbjct: 1 MIYKVFYQETKDQSPRRESTKALYLNIDATDELDGRIK.^LVEDNTYYNVEFIELLSDK 60 

Query: 61 HLEYEKETGVFELTEF 76 
HL+YEKETGVFELTEF 
45 Sbjct: 61 HLDYEKETGVFELTEF 76 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1590 

50 A DNA sequence (GBSxl684) was identified in S.agalactiae <SEQ ID 4915> which encodes the amino 
acid sequence <SEQ ID 4916>. This protein is predicted to be glycoprotein endopeptidase. Analysis of this 
protein sequence reveals the following: 

Possible site: 13 
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> Seems to have no N-terminal signal sequence (oi 



Final Results 

bacterial cytoplasm --- Certainty=0 .0430 (Affirmative) < succ 
bacterial membrane --- Certaxnty=0. 0000 (Not Clear) < suco 
bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAA76861 GB-.Y17797 hypothetical protein [Enterococcus faecalis] 
Identities = 94/182 (51%), Positives = 127/182 (69%), Gaps = 6/1B2 (3%) 



WAEGPGSYTGLR4- V TAK IAYTLK +LVG+SSL AL N + L+VPL DARR N 



VYVGFYQNGDTV KPDCHTSLEEVLQEVGNKANVHFVGE-VAAFFDQIKKALPHAKI 175 

VY G Y+ D V PD H SL E+D+++ N+ N+ FVGE V F ++I + +PH +1 

VYAGAYRFVDGVWQNELPDQHISLRELLEQLKNEPNLFFVGEDVEKFTEEIAQIIPHGE1 192 



A related DNA sequence was identified in S.pyogenes <SEQ ID 4917> which encodes the amino acid 
sequence <SEQ ID 4918>. Analysis of this protein sequence reveals the following: 

Possible site: 36 

»> Seems to have no N- terminal signal sequence 

INTEGRAL Likelihood = -1.38 Transmembrane 99 - 115 ( 99 - 115) 



Query: 


2 


Sbjct: 


13 


Query. 


62 


Sbjct: 


73 




121 


Sbjct: 


133 


Query: 




Sbjct: 


193 



Final Results 

bacterial membrane Certainty=0. 1553 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related sequence was also identified in GAS <SEQ ID 9159> which encodes the amino acid sequence 
<SEQ ID 9160>. Analysis of this protein sequence reveals the following: 

Possible site: 25 
»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -1.38 Transmembrane 88 - 104 ( 88 - 104) 

Final Results 

bacterial membrane Certainty=0. 1553 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 134/232 (57%) , Positives = 172/232 (73%) , Gaps = 3/232 (1%) 

Query: 2 MKVIjAFDTSSKALSVAVLN^ECIATVTINIKKNHSINLMPAIDFLMQSIDLEPQDLDRI 61 

MK LAFDTS+K LS+A+L+4- LA +T+NI+K HS++LMPAIDFLM DL+PQDL+RI 
Sbjct: 12 MKTLAFDTSNKTLSIAILDDETLLADMTMTIQKKHSVSLMPAIDFLMTCTDLKPQDLERI 71 

Query: 62 WAEGPGSYTGLRVAVATAKMLAYTLKIDLVGVSSLYALTNGFSE NDLLVPLIDARR 118 

WA+GPGSYTGLRVAVATAK LAY+L I LVG+SSLYAL + N L+VPLIDARR 

Sbjct: 72 WAKGPGSYTGLRVAVATAKTLAYSLNIALVGISSLYALAASTCKQYPNTLVVPLIDARR 131 
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Sbjct: 132 QNAYVGYYRQGKSVMPQAHASLEVI I EQLVEEGQL I FVGETAPFAEKIQKKLPQAILLPT 191 

Query: 179 LPCAVAIGRKGQKMKSVNVDAFVPRYLKRVEAEENWLKNHCETNTEEYIKEV 230 

LP A G GQ + NVDAFVP+YLKRVEAEENWLK++ + Y+KR+ 
Sbjct: 192 LPSAYECGLLGQS1APEHVDAFVPQYLKRVEAEEOTILKDNSIKDDSHYVKRI 243 

SEQ ID 4916 (GBS69) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 17 (lane 9; MW 28.9kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 20 (lane 4; MW 53.9kDa). 

The GBS69-GST fusion product was purified (Figure 197, lane 6) and used to immunise mice. The 
resulting antiserum was used for FACS (Figure 285), which confirmed that the protein is immunoaccessible 
on GBS bacteria. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1591 

A DNA sequence (GBSxl685) was identified in S.agalactiae <SEQ ID 4919> which encodes the amino 
acid sequence <SEQ ID 4920>. This protein is predicted to be ribosomal-protein-alanine acetyltransferase. 
Analysis of this protein sequence reveals the following: 
Possible site: 22 

»> Seems to have a cleavable N-term signal seq. 

Final Results 

bacterial outside — Certainty=0 . 3000 (Affirmative) < suco 
bacterial membrane — Certainty=0. 0000 (Not Clear) < suco 
bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 10137> which encodes amino acid sequence <SEQ ID 
10138> was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC06803 GB:AE00069S ribosomal-protein-alanine acetyltransferase 
[Aquifex aeolicus] 

Identities = 44/141 (31%), Positives = 74/141 (52%), Gaps = 8/141 (5%) 

Query: 9 LREFEMESSEQAIAIWSVnSDVYDKSPWSLSQISEDLKKDSTDYFFVYNDGEVIGFLALQ 68 

+RE EE E+ ++ + + + WS +D + + F + DG+V+G++ 

Sbjct: 4 VREMEREDVER VYEINRESFTTDAWSRFSFEKDFENPCFSRRFVLEEDGKWGYVIFW 60 

Query: 69 QLVGEVEITNIAVKKNYQGKGYAYQIiM SMIADIEVPVFLEVRYSNIVAQKLYERCG 124 

+ E I A+ Y+GKGY +L+ S + D V L+VR SN+ A LY++ G 
Sbjct: 61 WKEEATIMTFAIAPGYRGKGYGEKLLREAISRLGDKVKRWLDWKSNLRAINLYKKLG 120 

Query: 125 EWLRKRKNYYHDPIEDAIVM 145 

F V+ +RK YY D E+A++M 
Sbjct: 121 FKWTERKGYYSDG-ENALLM 140 

A related DNA sequence was identified in S. pyogenes <SEQ ID 492 1> which encodes the amino acid 
sequence <SEQ ID 4922>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 

- Certainty=0. 3800 (Affirmative) < suco 
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bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 
bacterial outside --- Certainty=0 .0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 6S/140 (46%) , Positives = 96/140 (68%) , Gaps = 1/140 (0%) 

Query: 9 LREFEMES-SEQALAIWSVLSDVYDKSPVJSLSQISEDLKKDSTDYFFVYNDGEVIGFLAL 67 

L E M++ EQA 1+ +L VY SPW+L Q+ D+++D TDYF +Y+ +++GFLA+ 
Sbjct: 6 LSESNMKTVEEQAKNIYQLLEMVYGTSPMTL3QVLID1RRDQTDYFLLYDHDKLLGFLAI 65 

Query: 68 QQLVGEVEITNIAVKKNYQGKGYAYQLMSMIADIEVPVFLEVRYSNIVAQKLYERCGFW 127 

Q L GEVE+T IA+ ++Q G A QLM+ + IE +FLEW SN AQ LY++ GF 
Sbjct: 66 QDLAGEVEMTQIAILPSHQELGLASQLMTHLDSIESDIFLEVRESNHRAQGLYQKFGFKF 125 

Query: 128 LRKRKNYYHDPIEDAIVMRK 147 

+ KR +YY +PIE A++M++ 
Sbjct: 126 IGKRPDYYRNPIETALLMKR 145 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1592 

A DNA sequence (GBSxl686) was identified in S.agalactiae <SEQ ID 4923> which encodes the amino 
acid sequence <SEQ ID 4924>. Analysis of this protein sequence reveals the following: 
Possible site: 21 

»> Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm --- Certainty=0. 0334 (Affirmative) < suco 
bacterial membrane --- Certainty=0. 0000 (Not Clear) <: suco 
bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1593 

A DNA sequence (GBSxl687) was identified in S.agalactiae <SEQ ID 4925> which encodes the amino 
acid sequence <SEQ ID 4926>. Analysis of this protein sequence reveals the following: 
possible site: 38 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -1.75 Transmembrane 86 - 102 ( 86 - 104) 

Final Results 

bacterial membrane --- Certainty=0. 1702 (Affirmative) < suco 
bacterial outside --- Certainty=o. 0000 (Not Clear) < suco 
bacterial cytoplasm --- Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAB04267 GB:AP001508 glycoprotein endopeptidase [Bacillus halodurans] 
Identities = 194/331 (58%), Positives = 263/331 (78%), Gaps = 1/331 (0%) 
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Sbjct: 12 ILAIETSCDETSAAVIENGTTILSNW8SQIDSHKRFGGWPEIASRHHVEQITVIVEEA 71 

Query: 66 LQEAGIVASDLDAVAVTYGPGLVGM.LVGMaaAKAFAWANKLPLIPIiraMAGHLMAARDV 125 

+ EAG+ +DL AVAVT GPGLVGALL+G+ AAKA A+A++LPLI ++H+AGH+ A R + 
Sbjct: 72 MHEAGVDFADLAAVAVTEGPGLVGALIjIGVISIAAKAIAFAHQLPLIGVHHIAGHIYANRLL 131 

Query: 126 KELQYPLLALLVSGGHTELVYVSEPGDYKIVGETRDDAVGEAYDKVGRVMGLTYPAGREI 185 

KEL++PLLAL+VSGGHTEL+Y+ G+++++GETRDDAVGEAYDKV R +GL YP G I 
Sbjct: 132 KELEFPLLALWSGGHTELIYMENHGEFEVIGETRDDAVGEAYDKVARTLGLPYPGGPHI 191 

Query: 186 DQIAHKGQDTYHFPRAMIKEDHLEFSFSGLKSAFIIILHHNAEQKGEALVLEDLCASFQAA 245 

D+LA G+DT FPRA ++ D +FSFSGLKSA IN HNA+Q+GE + ED+ ASFQA+ 
Sbjct: 192 DPJATOGEDTLQFPRAWLEPDSFDFSFSGLICSAV1MTLHNAKQRGENVQAEDVAASFQAS 251 

Query: 246 VLDILLAKTQKALLKYPVKTLWAGGVAANQGLRERLATDISPD-IDWIPPLRLCGDNA 304 

V+D+L+ KT+KA +Y V+ +++AGGVAAN+GLR L + ID+VIPPL LC DNA 

Sbjct: 252 VIDVLVTKTKKAAEEYKVRQVLLAGGVAANKGLRTALEEAFFKEPIDLVIPPLSLCTDNA 311 

Query. 305 GMIALARAIEFEKENFASLKLNAKPSLAFES 335 

MI AA+I+F+++ FA + LN +PSL E+ 
Sbjct: 312 AMIGAAASIKFKQQTFAGMDLNGQPSLELEN 342 

A related DNA sequence was identified in S.pyogenes <SEQ ID 4927> which encodes the amino acid 
sequence <SEQ ID 4928>. Analysis of this protein sequence reveals the following: 

Possible site: 38 
>» Seems to have no N-tent 
INTEGRAL Likelihood = 

Final Results 

bacterial membrane — Certainty=0. 2105 (Affirmative) < suco 
bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 
bacterial cytoplasm — Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:BAB04267 GB:AP001508 glycoprotein endopeptidase [Bacillus halodurans] 
Identities = 196/330 (59%), Positives = 255/330 (76%), Gaps = 2/330 (0%) 

Query: 6 ILAVESSCDETSVAILPMESTLLSNVIASQVESHKRFGGWPEVASRHHVEVITTCFEDA 65 

ILA+E+SCDETS A+++N +T+LSNV++SQ++SHKRFGGWPE+ASRHHVE IT E+A 
Sbjct: 12 ILAIETSCDETSAAVIENGTTILSNWSSQIDSHKRFGGWPEIASRHHVEQITVIVEEA 71 

Query: 66 LQEAGISASDLSAVAVTYGPGLVGALLVGLAAAKAFAWANHLPLIPVNHMAGHLMAAREQ 125 

+ EAG+ +DL+AVAVT GPGLVGALL+G+ AAKA A+A+ LPLI V+H+AGH+ A R 
Sbjct: 72 MHEAGVDFADIAAVAVTEGPGLVGALLIGVNAAKAIAFAHQLPLIGVHHIAGHIYANRLL 131 

Query: 126 KPLVYPLIALLVSGGHTELVYVPEPGDYHIIGETRDDAVGEAYDKVGRVMGLTYPAGREI 185 

K L +PL+AL+VSGGHTEL+Y+ G++ +IGETRDDAVGEAYDKV R +GL YP G I 
Sbjct: 132 KELEFPLLALWSGGHTELIYMENHGEFEVIGETRDDAVGEAYDKVARTLGLPYPGGPHI 191 

Query: 186 DQLAHKGQDTYHFPRAMITEDHLEFSFSGLKSAFINLHHNAKQKGDELILEDLCASFQAA 245 

D+LA G+DT FPRA + D +FSFSGLKSA IN HNAKQ+G+ + ED+ ASFQA+ 
Sbjct: 192 DRLAVNGEDTLQFPRAWLEPDSFDFSFSGLKSAVINTLHNAKQRGENVQAEDVAASFQAS 251 

Query: 246 VLDILLAKTKKALSRYPAKMLVVAGGVAANQGLRDRLAQEI--THIEWIPKLRLCGDNA 303 

V+D+L+ KTKKA Y + +++AGGVAAN+GLR L + I++VIP L LC DNA 

Sbjct: 252 VIDVLOTKTKKAAEEYKVRQVLLAGGVAANKGLRTALEEAFFKEPIDLVIPPLSLCTDNA 311 

Query: 3 04 GMIALAAAIEYDKQHFANMSLNAKPSLAFD 333 

MI AA+I++ +Q FA M LN +PSL + 
Sbjct: 312 AMIGAAASIKFKQQTFAGMDLNGQPSLELE 341 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 288/334 (86%), Positives = 313/334 (93%), Gaps = 1/334 (0%) 
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Query: 


1 


MKDRYILAVESSCDETSVAILKNDKELLMillASQVESHKRFGGWPEVASRHHVEWTT 


60 






M DRYILAVESSCDETSVAILKN+ LL+N+IASQVESHKRFGGWPEVASRHHVEV+TT 




Sbj ct : 


x 


MTDRYIlAVESSCDETSVAILKNESTLIiSlWIASQVESHKRFGGVVPEVASRHHVEVITT 


60 


Query: 


61 


CFEDALQFAGIVASDLDAVAVTYGPGLVGALLVGMAAAKAFAWMIKLPLIPINHMAGHLM 


120 






CFEDALQEAGI ASDL AVAVTYGPGLVGALLVG+AAAKAFAWAN LPLIP+NHMAGHLM 




Sbj ct: 


61 


CFEDALQEAGISASDLSAVAVTYGPGLVGALLVGLAAAKAFAWANHLPIiIPVWHMAGHLM 


120 


Query: 


121 


AARDVKELQYPLLALLVSGGHTELVYVSEPGDYKIVGETRDDAVGEAYDKVGRVMGLTYP 


180 






AAR+ K L YPL+ALLVSGGHTELVYV EPGDY I+GETRDDAVGEAYDKVGRVMGLTYP 




Sbjct: 


121 


AAREQKPLVYPLIALLVSGGHTELVYVPEPGDYHIIGETRDDAVGEAYDKVGRVMGLTYP 


180 






AGREIDQLAHKGQDTYHFPRM4IKEDKLEFSFSGLKSAFINLHHNAEQKGEALVLEDLCA 


240 






AGREIDQLAHKGQDTYHFPRAMI EDHLEFSFSGLKSAFINLHHNA+QKG+ L+LEDLCA 














241 


SFQAAVLDILLAKTQKALLKYPVKTLWAGGVAANQGLRERLATDISPDIDWIPPLRLC 


300 






SFQAAVLDILLAKT+KAL +YP K LWAGGVAANQGLR+RLA +1+ I+WIP LRLC 




Sbjct: 


241 


SFCJAAVLDILIiAKTKKALSRYPAKML WAGGVAANQ3LRDRLAQE IT- HIE WI PKLRLC 


299 




301 


GDNAGMIALAAAIEFEKENFASLKIiNAKPSLAFE 334 








GDNAGMIAIAAAIE++K++FA++ LNAKPSLAF+ 




Sbjct: 




GDNAGMIALAAAIEYDKQHFANMSLNAKPSLAFD 333 





25 SEQ ID 4926 (GBS371) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 64 (lane 7; MW 41kDa), in Figure 170 (lane 4 & 5; MW 55kDa) and in Figure 
239 (lane 6; MW 55kDa). It was also expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of 
total cell extract is shown in Figure 69 (lane 7; MW 65kDa). 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
30 vaccines or diagnostics. 

Example 1594 

A DNA sequence (GBSxl688) was identified in S.agalactiae <SEQ ID 4929> which encodes the amino 

acid sequence <SEQ ID 4930>. Analysis of this protein sequence reveals the following: 

Possible site: 33 
35 »> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 1027 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

40 bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
45 vaccines or diagnostics. 

Example 1595 

A DNA sequence (GBSxl689) was identified in S.agalactiae <SEQ ID 4931> which encodes the amino 
acid sequence <SEQ ID 4932>. Analysis of this protein sequence reveals the following: 

Possible site: 20 
50 »> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 1307 (Affirmative) < suco 
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The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 



Example 1596 

A DNA sequence (GBSxl690) was identified in S.agalactiae <SEQ ID 4933> which encodes the amino 
acid sequence <SEQ ID 4934>. This protein is predicted to be L41 71-60 protein. Analysis of this protein 
sequence reveals the following: 

:e: 36 

:> have a oleavable N-term signal seq. 

Final Results 

bacterial outside Certainty=0. 3000 (Affirmative) < suco 

bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 10135> which encodes amino acid sequence <SEQ ID 
1,01 3 6> was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 





2 


Sbjct: 


74 


Query: 


62 


Sbjct: 


131 


Query: 


122 


Sbjct: 


190 




182 


Sbjct: 


250 




242 


Sbjct: 


310 



IA L+D+FYIG TK G 



DNL+ ++G H+ +A LK LE G S +NQ F I+ENT 4 



++RL TSW+T 



No corresponding DNA sequence was identified in S.pyogenes. 

SEQ ID 4934 (GBS648) was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 131 (lane 8-10; MW 60kDa) and in Figure 186 (lane 6; MW 60kDa). It was also 
expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell extract is shown in Figure 131 
(lane 12; MW 35kDa), in Figure 140 (lane 10; MW 35kDa) and in Figure 178 (lane 7; MW 35kDa). 

Purified GBS648-GST is shown in Figure 243, lane 6; purified GBS648-His is shown in Fig. 229, lane 7. 
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Based on this analysis, it was predicted that this protein and its epitopes, could be useful 
vaccines or diagnostics. 

Example 1597 

A DNA sequence (GBSxl691) was identified in S.agalactiae <SEQ ID 4935> which encodes the amino 
acid sequence <SEQ ID 4936>. Analysis of this protein sequence reveals the following: 

N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2279 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear). < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 



Example 1598 

A DNA sequence (GBSxl692) was identified in S.agalactiae <SEQ ID 4937> which encodes the amino 
acid sequence <SEQ ID 493 8>. This protein is predicted to be ribosomal protein S14 (rpsN). Analysis of 
this protein sequence reveals the following: 

3 N-terminal signal sequence 

Final Results 

bacterial cytoplasm --- Certainty=0. 3848 (Affirmative) < suco 

bacterial membrane --- Certainty=0. 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB12716 GB:Z99108 similar to ribosomal protein S14 [Bacillus subtilis] 
Identities = 67/89 (75%) , Positives = 76/89 (85%) 

Query: 1 MAKKSKIAKFQKQQKLVEQYAELRRE LKE KGD YEALRKL PKDSNPNRLKNRDL IDGRPHA 60 

MAKKSK+AK K+Q+LVEQYA +RRELKEKGDYEAL KLP+DS P Rh NR ++ GRP A 
Sbjct: 1 mKKSKVAKEDKRQQLVEQYAGIRRELKEKGDYEALSKLPRDSAPGRLfiNRCM\n:GRPRA 60 

Query: 61 YMRKFGMSRINFRNLAYKGQI PGI KKASW 89 

YMRKF MSRI FR LA+KGQIPG+ KKASW 
Sbjct: 61 YMRKFKMSRIAFRELAHKGQI PGVKKASW 89 

A related DNA sequence was identified in S.pyogenes <SEQ ID 4939> which encodes the amino acid 
sequence <SEQ ID 4940>. Analysis of this protein sequence reveals the following: 

:> N-terminal signal sequence 

- Final Results 

bacterial cytoplasm Certainty=0. 3799 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below. 
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Identities = 73/89 (82%) , Positives = 85/89 (95%) 



Query: 61 YMRKFGMSRINFRNLAYKGQIPGIKKASW 89 

YMRKFG+SRINFR+LA4-KGQ4-PG4- KRSW 
Sbjct: 61 YMRKFGVSRINFRDLAHKGQLPGVTKASW 89 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1599 

A DNA sequence (GBSxl693) was identified in S.agalactiae <SEQ ID 4941> which encodes the amino 
acid sequence <SEQ ID 4942>. Analysis of this protein sequence reveals the following: 

3 N- terminal signal sequence 

• Final Results 

bacterial cytoplasm Certainty=0. 5183 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB95931 GB:AL359989 galactose-l-phosphate uridylyltransf erase 
[Streptomyces coelicolor A3 (2)] 
Identities = 31/105 (29%) , Positives = 51/105 (48%) , Gaps = 4/105 (3%) 

Query: 27 DKCPFC- -DKSQLGKILDVKDDMIWVENKYPTL- -EETYQTLVIESNDHNGDISVYSESK 82 

D+CP C D +L +1 D D++ EN++P+L + +V ++DH+ + SE + 

Sbjct: 68 DQCPLCPSDGERLSEIPDSAYDVWFENRFPSIAGDSGRCEWCFTSDHDASFADLSEEQ 127 

Query: 83 MRQLLDYLLSKWQLMEESGHYRSVVLYRNFGPLSGGSLRHPHSQI 127 

R +LD + + V + M G G +h HPH QI 

Sbjct: 128 ARLVLDAWTDRTSELSHLPSVEQVFCFENRGAE IGVTLGHPHGQ I 172 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1600 

A DNA sequence (GBSxl694) was identified in S.agalactiae <SEQ ID 4943> which encodes the amino 
acid sequence <SEQ ID 4944>. Analysis of this protein sequence reveals the following: 

.i uncleavable N-terro signal seq 

Final Results 

bacterial membrane Certainty=0 .0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty^O . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 10133> which encodes amino acid sequence <SEQ ID 
10134> was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 
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>GP:BAB06998 GB-.AP001518 unknown conserved protein [Bacillus haloduxans] 
Identities = 186/410 (45%), Positives = 258/410 (S2%) , Gaps = 27/410 (6%) 

YDTIIIGGGPAGM^ISSNFYCOTTLLIFJCt^^ 63 
++ I+IGGGPAG+MA++S+ +G + LL++K +LG+KLA +GGGRCNVTN LDEL+A 
HEVIVIGGGPAGLMASVSAAEHGARVLLI^KGDKLGRKIAISGGGRCWrMRMPIiDELIA 6 1 



IPGNGRF4YS FS F+N DII FF+ G+ LKEED GRMFP +DK+ T++ L +1 4 



Query: 




Sbjct: 2 


Query: 


64 


Sbjct: 


62 


Query: 


124 


Sbjct: 


122 




182 


Sbjct: 


182 


Query: 


233 


Sbjct: 


242 


Query: 


281 


Sbjct: 


302 


Query: 




Sbj ct: 


362 



V G +S+ K+FVT 



A related DNA sequence was identified in S.pyogenes <SEQ ID 4945> which encodes the amino acid 
sequence <SEQ ID 4946>. Analysis of this protein sequence reveals the following: 

;e: 23 

d have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0. 0448 (Affirmative) < suco 

bacterial membrane Cextainty=0. 0000 (Hot Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

s = 308/386 (79%) , Positives = 344/386 (88%) 

MKHYDT 1 1 IGGGPAGMMA&I SSNFYGNKTLL I EKNKRI/3KKLAGTGGGRCMVTNNGNLDE 60 
M YDTI I IGGGPAGMMAAISS++YG JCTLLIEKN+RI^KKIAGTGGGRCNVTN+GNLD 
MTQYDTI I IGGGPAGMMAAISSSYYGYKTLLIEIOIF^LGKKLAGTGGGRCmmiSGmjDV 6 0 



IK LGGQ++T TEWSVKK D FY+K+ D F KLIVTTGGKSYPSTGSTGFGHDIA 



RHFKL VTD+EAAESPLLTDFPHK LQGISLDDVTLS++KH+ITHDLLFTHFGLSGPAAL 





Ident: 




45 








Sbjct: 


1 




Query: 


61 


50 








Sbjct: 


61 




Query: 


121 


55 


Sbjct: 


121 






181 




Sbjct: 


181 


60 








Query: 


241 




Sbjct: 




65 




301 
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VKQ+S K + L+ KLK L I +TGKMSIAKSFVTKGGVDLKEINPKTLESKKV GL+FA 
Sbjot: 301 WQLSPKQEKELLDKLKHLQIPITGmSIAKSFVTKGGVDLKEINPKTLESKKVPGLYFA 360 

Query: 351 GEVBD INAHTGGFNI TSALCTGWVAG 386 

GEVLDINAHTGGFNITSALC+GW+AG 
Sbjct: 351 GEVLDINAHTGGFN1TSALCSGWIAG 386 

SEQ ID 4944 (GBS196) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 26 (lane 3; MW 44.5kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 37 (lane 4; MW 69.5kDa). 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1601 

A DNA sequence (GBSxl695) was identified in S.agalactiae <SEQ ID 4947> which encodes the amino 
acid sequence <SEQ ID 4948>. Analysis of this protein sequence reveals the following: 



Final Results 

bacterial cytoplasm Certainty=0. 1550 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 1013 1> which encodes amino acid sequence <SEQ ID 
101 32> was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 



Query: 19 KTVSEIAEILGVSRQAMNNRV-KTLPEECVEK NSKGVTWNRDGLIKLEEIYKKTIL 74 

KT+ ELA+ LGVS+Q + N++ K E+ V+ V+N G + KKT+ 

Sbjct: 6 KTIKELADELGVSKQTIRNKIDKDFREKFVQTIKIKGNNTLVINNAGY SLLKKTLQ 61 

Query: 75 EEEPIDEEASRRELLEILVDEKNTEITRLYEQLKAKDIQIASKDEQLHVKDIQIAEKDKQ 134 

+ +++ + + IL EQL K+ Q++ KD+QL KD QI++ 

Sbjct: 62 NDTAQTAKTLQNDTAQTKL ICFLEEQLDKKEQQLSVKDKQLENKDTQISQMQNL 115 

Query: 135 LDQQQQLTLTAMEDTQRLQLELNEAKA EVEEIQEAKEEKIQELEAVK 181 

LDQQQ+L L + + 4- E+NE KA ++++ + E -t-E+E +K 

Sbjct: 116 LDQQQRLALQDKKLLEEYKSEINELKALKMPR3DMKEGSSIRGEAQEEIERLK 168 



A related DNA sequence was identified in S.pyogenes <SEQ ID 4949> which encodes the amino acid 
sequence <SEQ ID 4950>. Analysis of this protein sequence reveals the following: 

Possible site: 14 
45 >>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .3951 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

50 bacterial outside --- Certainty=0.0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 132/194 (68%) , Positives = 154/194 (79%) , Gaps = 4/194 (2%) 
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Query: 1 MIFFYKKI STKEEVMTVEKWSELAEILGVSRQAMNNRVKTLPEECVEKNSKGVTW 57 

M+ F +1 S KEE M +EKWSELA+ILGVSRQR+MNRVK+LPEE ++KN KGVTW 
Sbjct: 1 ^WLFIJIRIFSDSDKEE]SIMGIEKWSEL7U3ILGVSRQAVNNRVKSLPEEDLDI<NEKGVTVV 60 

Query. 58 NRDGLIKLEEiyKICIILEEEPIDEEASRRELLEIIaVDEKOTEITRLYEQIjKAKDIQIASK 117 

R GL+KLEEIYKKTI ++EPI EE +RELLEILVDEKNTEITRLYEQLKAKD Q+ASK 
Sbjct: 61 KRSGLVKLEEIYKKIIFDDEPISEETKQRELLEILVDEKNTEITRLYEQLKAKDAQLASK 120 

Query: 118 DEQLHVKDIQIAEIOlKQLDQQQQLTLTAMEDTQRLQLEimAKAEVEEIQEAKEEKIQEL 177 

DEQ+ VKD+QIAEKDKQLDQQQQLT AM D + L+LEL EAKAE + + + E++Q 
Sbjct: 121 DEQMRVKDVQIAEKDKQLDQQQQLTAKAMADKETLKLELEEAKAEANQAR-LQVEEVQAE 179 

Query: 178 EAVKKSFFGRFFNK 191 

KK FF R F K 
Sbjct: 180 VGPKKGFFTRLFAK 193 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1602 

A DNA sequence (GBSxl697) was identified in S.agalactiae <SEQ ID 495 1> which encodes the amino 
acid sequence <SEQ ID 4952>. Analysis of this protein sequence reveals the following: 

3 N- terminal signal sequence 

25 Final Results 

bacterial cytoplasm Certainty=0. 2157 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 {Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

30 The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAB06137 GB:AP001515 DNA polymerase III (alpha subunit) 
[Bacillus halodurans] 
Identities = 31/87 (35%) , Positives = 52/87 (59%) , Gaps = 1/87 (1%) 

35 Query: 13 EYIAFDLEr^NWGE-HSHIIQVSAVKYSNHQEIALFDTYVHTICVPLQSFINGLTGITARD 71 

E++ FD+E + ++ II+++AVK N + I F+ + PL + I LTGIT 

Query: 72 IIGAPKIEIVLTDFQSFVGDTPLIGYN 98 
40 + G P++E Vh +F +F+GD L+ +N 

Sbjct: 478 LKGQPEVEQVLNEFHAFIGDAVLVAHN 504 

A related DNA sequence was identified in S.pyogenes <SEQ ID 4953> which encodes the amino acid 
sequence <SEQ ID 4954>. Analysis of this protein sequence reveals the following: 

d N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .3427 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 .0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities - 136/200 (68%) , Positives = 159/200 (79%) 

Query: 3 FLGEIMKQLQEYIAFDLEFNTVGEHSHIIQVSAVKYSNHQEIALFDTYVHTKVPLQSFIN 62 

FL E MK L YIAFDLEFNTV + SHIIQVSAVKY +H+E+ FDTYV+T VPLQSFIN 
Sbjct: 9 FLEENMKHLDTYIAFDLEFNTVNDVSHirQVSAVKYDHKKEVDSFDTYVYTDVPLQSFIN 68 



WO 02/34771 



PCT/GB01/04789 



Query: 63 GLTGITARDIIGAPKIEIVLTDFQSFVGDTPLIGYiraYKSDLPLLVENGLDLTSQYQVDL 122 

GLTGIT+ I PK+E V+ F++FV5+ PLIGYN KSDLP+L ENGLDL QYQ+DL 
Sbjct: 69 GLTGITSDKIAAEPKVEEVMRAFKlJF q /GELPLIGYNAQKSDIjPIIiAENGLDIiRDQ'YQIDL 128 

Query: 123 YDEftFVRRSTDraGIVNLKLTTVADFLGIKGKAHNSLEDARMTARVYEKFLDLDENKIYL 182 

+DEA+ RRS DLNGI NL+L TVA FLGIKG+ HNSLEDARMTA +Y+ FL+ D NK YL 
Sbjct: 129 FDEAYDRRSADLNGIANLRLQTVATFLGIKGRGHNSLEDARMTAVIYKSFLETDTNKAYL 188 

Query: 183 KQQKEVAVDSPFATLGNLFD 202 

QQ+EV D+PFA LG+ FD 
Sbjct: 189 SQQEEVTTDNPFAALGDFFD 208 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1603 

A DNA sequence (GBSxl698) was identified in S.agalactiae <SEQ ID 4955> which encodes the amino 
acid sequence <SEQ ID 4956>. Analysis of this protein sequence reveals the following: 

Possible site: 46 

»> Seems to have no N-temiinal signal sequence 

INTEGRAL Likelihood =-12.10 Transmembrane 143 - 159 ( 136 - 166} 
INTEGRAL Likelihood = -4.73 Transmembrane 169 - 185 ( 168 - 188) 

Final Results 

bacterial membrane Certainty=0 . 5840 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm — Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB42766 GB:AL049841 transcriptional regulator [Streptomyces 
coelicolor A3 (2) ] 

Identities = 46/141 (32%) , Positives = 71/141 (49%) , Gaps = 11/141 (7%) 

Query: 5 YSTGDIAKElAGVTVRTVQYYDKRGILSPSELSEGGRRVYSIADLEKLRQIiyLRDLDFSI 64 

YS G +A AGVTVRT+ +YD G+L PSE S G R YS ADL++L+QI++ R+L F + 
Sbjct: 3 YSVGQVAGFAGVTVRTLHHYDDIGLLVPSERSHAGHRRYSDADLDRLQQILFYRELGFPL 62 

Query: 65 DNIKNLFTEDNASQILELFLQVQIRELRL AIDSKKDKLDKAVNLLKTVEKQD 116 

D + L + A L Q ++ R+ A++ + +NL ++ 
Sbjct: 63 DEVAALLDDPAADPRAHLRRQHELLSARIGKLQKMAAAVEQAMEARSMGINL TPEEK 119 

Query: 117 SKTLGYLSDIVLMEENKRKWG 137 

Sbjct: 120 FEVFGDFDPDQYEEEVRERWG 140 

There is also homology to SEQ ID 1712. 

SEQ ID 4956 (GBS372) was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 69 (lane 8; MW 55kDa). 

GBS372-GST was purified as shown in Figure 215, lane 8. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 
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Example 1604 

A DNA sequence (GBSxl699) was identified in S.agalactiae <SEQ ID 4957> which encodes the amino 
acid sequence <SEQ ID 4958>. This protein is predicted to he cyclopropane-fatty-acyl-phospholipid 
synthase (mma2). Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .3145 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAD07482 GB:AE000557 cyclopropane fatty acid synthase (cfa) 
[Helicobacter pylori 26695] 
Identities = 167/397 (42%) , Positives = 254/397 (63%) , Gaps = 14/397 (3%) 

VMDSLIIKQLIKSTFDIPLQVTYPNGNIETYNGSNPHWLKLNKNFSVSELSKDPSIVLG 6 1 
++ ++K + K + QV + + ++ +P LK+++ S++ KD S+ + 

MISKFLLKSMFKQWKNGDYQWFWDNSVyRNGEHSPKFTLKIHRPLKFSDIKKDMSLTIA 60 

EAVMDGDIEIYGSIQELILSAY-RCGDSFLRNSKFSKLIPKQFHDKKHSKSDIQKHYDIG 120 
EA MDG I+I GS+ E++ S Y + L +K IK + S+I KHYD+G 





2 


Sbjct: 






62 


Sbjct: 


61 




121 


Sbj ct : 


117 




181 


Sb j ct : 


177 


Query: 




Sbjct: 


237 




298 


Sbj ct : 


293 




357 


Sbj ct: 


353 



NDFY +WLD+T++YSCAYFK ++D+L AQL K+ H L KL+ +PG KLLDIGCGWG I 



KR++E GLE+KVT+ A 



GMFEHVGKEI^SQYFQTISKRI^INGIALIHGITGQVGGKHGSGTOSWINKYIFPGGYIP 297 
GMFEHVGK+NL YF+ + + L G+ L+H I G TN+W++KYIFPGGY+P 
GMFEHVGKDNLPFYFKKVKEVLKRGGMFLLHS I LCCFEGK TNAWVDKYI FPGGYLP 292 



: LR HY KTL++W NF++ L +V++ ++D+RFI MWDLY 



L++CA++F G+ D+FQ LL+ V +T P+T++Y+Y 
LRTCASAFRVGSADLFQLLLTNSVD -NTFPLTKEYI Y 388 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1605 

A DNA sequence (GBSxl700) was identified in S.agalactiae <SEQ ID 4959> which encodes the amino 
acid sequence <SEQ ID 4960>. Analysis of this protein sequence reveals the following: 

Possible site: 35 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 4903 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 
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The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB11796 GB: 299104 similar to hypothetical proteins [Bacillus subtilis] 
Identities = 44/97 (45%) , Positives = 60/97 (61%) 

5 

Query: 1 M«QMWRQAQKLQKQMEQKQADIlMSQFTGKSAQELvTVTFTGDKKLISIDYKEAvvD 60 

M NMQ MM+Q QK+QK M + Q +LA G + +VTV G K+++ + KE WD 

Sbjct: 5 MGNMQKMMKQMQKMQKDMAKAQEELAEKVVEGTAGGGMVTVKANGQKE I LDVI I KEE WD 64 

10 Query: 61 PEDIETLQDMTTQAINDALSQVDDATKKIMGAFAGKM 97 

PEDI+ LQD+ A N+AL +VD+ T + MG F M 
Sbjct: 65 PEDIDMLQDLVLAATNEALKKVDEITNETMGQFTKGM 101 

A related DNA sequence was identified in S. pyogenes <SEQ ID 496 1> which encodes the amino acid 
15 sequence <SEQ ID 4962>. Analysis of this protein sequence reveals the following: 

Possible site: 35 

»> Seems to have no N-terminal signal sequence 

Final Results 

20 bacterial cytoplasm — Certainty=0 .4451 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Mot Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 
25 Identities - 84/99 (84%) , Positives = 94/99 (94%) 

Query: 1 ^mNMQN^M^QAQKLQKQMEQKQADLAASQFTGKSAQELVTVTFTGDKKLISIDYKEAWD 60 

MMNMQNMM+QAQKLQKQMIQKQADLAA QFTGKSAQ+LVT TFTGDKKL+ ID+KEAWD 
Sbjct: 1 ^M^QMMMKQAQKLQKQMEQKQ?J3L^AMQFTGKSAQDLVTATFTGDKKLVGIDFKEAWD 60 

30 

Query: 61 PEDIETLQDMTTQAINDALSQVDDATKKIMGAFAGKMPF 99 

PED+ETLQDMTTQAINDAL+Q+D+ TKK +GAFAGK+PF 
Sbjct: 61 PEDVETLQDMTTQAINDALTQIDETTKKTLGAFAGKLPF 93 

35 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1606 

A DNA sequence (GBSxl701) was identified in S.agalactiae <SEQ ID 4963> which encodes the amino 
acid sequence <SEQ ID 4964>. Analysis of this protein sequence reveals the following: 

40 Possible site: 17 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3 963 (Affirmative) < suco 

45 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 
50 Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 



WO 02/34771 



PCT/GB01/04789 



-1791- 

Example 1607 

A DNA sequence (GBSxl702) was identified in S.agalactiae <SEQ ID 4965> which encodes the amino 
acid sequence <SEQ ID 4966>. Analysis of this protein sequence reveals the following: 

Possible site: 48 
5 »> Seems to have no N- terminal signal sequence 

INTEGRAL Likelihood = -2.76 Transmembrane 21 - 37 ( 19 - 39) 

Final Results 

bacterial membrane Certainty=0. 2105 (Affirmative) < suco 

10 bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 10129> which encodes amino acid sequence <SEQ ID 
1013O was also identified. 
15 The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1608 

20 A DNA sequence (GBSxl703) was identified in S.agalactiae <SEQ ID 4967> which encodes the amino 
acid sequence <SEQ ID 4968>. Analysis of this protein sequence reveals the following: 

Possible site: 36 

»> Seems to have no N-terminal signal sequence 

25 Final Results 

bacterial cytoplasm Certainty=0 . 1783 (Mf irmative) < suco 

bacterial membrane — Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside — Certainty=0. 0000 (Not Clear) < suco 

30 The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1609 

35 A DNA sequence (GBSxl704) was identified in S.agalactiae <SEQ ID 4969> which encodes the amino 
acid sequence <SEQ ID 4970>. This protein is predicted to be probable l,4-dihydroxy-2-naphthoate 
octaprenyltransferase. Analysis of this protein sequence reveals the following: 



Possible site: 32 



40 



45 



Seems to 


have no N-terminal signal sequence 










INTEGRAL 


Likelihood = 


-8. 75 


Transmembrane 


239 


255 


219 




INTEGRAL 


Likelihood = 


-8.33 


Transmembrane 


221 




219 


238 


INTEGRAL 


Likelihood = 


-6.74 


Transmembrane 


91 


107 






INTEGRAL 


Likelihood = 


-6.32 


Transmembrane 


39 


55 


35 


59 


INTEGRAL 


Likelihood = 


-3.77 


Transmembrane 


111 


127 






INTEGRAL 


Likelihood = 


-2.97 


Transmembrane 






143 


161 


INTEGRAL 


Likelihood = 


-1.28 


Transmembrane 


275 


291 


275 


291 


INTEGRAL 


Likelihood = 


-0.59 


Transmembrane 


177 


193 


177 
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Final Results 

bacterial membrane Certainty=0 .4503 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP-.CAB15875 GB:Z99123 alternate gene name: ipa-6d~similar to 
quinone biosynthesis [Bacillus subtilis] 
Identities = 75/290 (25%) , Positives = 139/290 (47%) , Gaps = 15/290 (5%) 

Query: 5 IFLELVEMKAKTASVLPFLIGLCFSAYYYNSVHPVWGLFFVAMFLFNMFTO^ 64 

I +L TAS +P L+G + +Y +++ + F +++ + +++N Y D+ 

Sbjct: 21 ILWQLTRPHTLTASFVPVLLGTVLAMFYVKVDLLLFLAMLFSCLWI-QIATNIjFNEYYDF 79 

Query: 65 RMAVDL-DYKiSIDTNIIGRENLSLRQIEVI(>4ftSI 1 VITSSMIGLVLVSQVGLPLLWMGLFCF 123 

+ +D+ IR + +I + + + ++G+ + + L +GL 

Sbjct: 80 KRGLDTAESVGIGGAIVRHGMKPKTILQIAIASYGIAILLGVYICASSSWWLALIGLVGM 139 

Query: 124 GIGVLYSFGPRPLSSLPLGEVFSGLTMGFMISLICVYLNTYQMFSWDIUJLSKIFLISLP 183 

IG LY+ GP P++ P GE+FSG+ MG + LI ++ T D +N+ I LIS+P 

Sbjct: 140 AIGYLYTGGPLPIAYTPFGELFSGICMGSVFVLISFFIQT DKIKMQSI-LISIP 192 

Query: 184 OTLWIANLMLANNLCDKEEDEKNHRYTLvHYTGIRGGLLLFAISNSIALLAIVFEFLFGL 243 

+ + + L+NN+ D EED+K R TL G +G + L A S ++A + +V + G 
Sbjct: 193 IAILVGAINLSNNIRDIEEDKKGGRKTLAILMGHKGAVTLLAASFAVAYIWWGLVITGA 252 

Query: 244 APVTVLLSLLLIPFIYKQTKLLWQKQVICRETFVCAVRILALGSATQVLTY 293 

A ++L+P + K Q++ I+A+ S Q T+ 

Sbjct: 253 ASPWLFWFLSVPKPVQAVKGFVQNEMPMN MIVAMKSTAQTNTF 29S 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1610 

A DNA sequence (GBSxl705) was identified in S.agalactiae <SEQ ID 497 1> which encodes the amino 
acid sequence <SEQ ID 4972>. Analysis of this protein sequence reveals the following: 

Possible site: 16 

>» Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -0.22 Transmembrane 155 - 171 { 154 - 171) 

Final Results 

bacterial membrane Certainty=0. 1086 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB15200 GB:Z99120 similar to NADH dehydrogenase [Bacillus subtilis] 
Identities = 178/403 (44%) , Positives = 249/403 (61%) , Gaps = 7/403 (1%) 

Query: 3 EILVLGAGYAGLKAVRMLQKQSG--DFHITLVDMNDYHYEATELHEVAAGSQPKEKITFP 60 

+I++LGAGY GL V L K G D ITLV+ ++YHYE T +HE +AG+ ++ + 
Sbjct: 7 KIVILGAGYGGLMTVTRLTKWGPNDADITLWKHNYHYETTWMHEASAGTLHHDRCRYQ 66 

Query: 61 IKDVINTNKVNFMQDEVLRVDAEWKTVIVKNNGELHY 120 

IKDVIN ++VNF+QD V + + K V + N GEL YDY+V+ LG V ETFGIKG E A 
Sbjct: 67 I KDVINQSR VNFVQDTVKAI KI DE KKV /LAN - GELQYD YLVIGLGAVPETFGI KGLKEYA 125 



Query: 



121 LQMTNISQAENIHNHIVNTMKLYRETKDE--NLLKLLVCX^GFTGIEIAGAMVDERPKYA 178 
+ NI+ + + HI Y ++ + L ++V GAGFTGIE G + P+ 
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Sb j ct : 


126 


FPIANINTSRLLREHIELQFATYNTEAEKRPDRLTIWGGAGFTGIEFLGELAARVPELC 


185 


Query: 


179 


ALAGVKPEQIEIICVEAATRILPMFDDEIAQYGVNLIKDLGIl^MLGSMIKEIKPGEWY 


238 










Sbjct: 


186 


KEYDVDRSLVRIICVEAAPTVLPGFDPEUTDYAVHYLEENGTOFKIGTAVQECTPEGVRV 


24 5 


Query: 


239 




298 










Sbjct: 


246 


G- - KKDEEPEQIKSQTVVWAAGVRGHPIVEEAGFENMRGRVKVNPDLRAPGHDNVFILGD 


303 




299 


VSAFMDTESGRPFPTTAQIATRMGAHVAKNLLHQIKGEATEDFSYSPQGTVASVGNTHGL 


356 






S FM+ ++ RP+P TAQIA + G VAKNL IKG E+F +GTVAS+G + + 




Sbjct: 


304 


SSLF^DTERPYPPTAQIAMQQGITVAKNLGRLIKGGELEEFKPDIKGTVASLGEHNAV 


363 




359 


GWGKTKIKKYPASVMKKI IMNKSLVDMGGLKEXLAKGRFDLY 401 








GW K+K PAS MKK+I N+SL +GGL L KG+F + 




Sbjct: 


364 


GWYGRKLKGTPASFMKKVIDNRSLFMIGGLGLTLKKGKFKFF 406 





There is also homology to SEQ ID 4666. 
20 Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 



Example 1611 

A DNA sequence (GBSxl706) was identified in S.agalactiae <SEQ ID 4973> which encodes the amino 
acid sequence <SEQ ID 4974>. This protein is predicted to be cytochrome d ubiquinol oxidase, subunit I 
(cydA-1). Analysis of this protein sequence reveals the following: 



INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 



d N-terminal signal sequence 



Likelihood 

Likelihood = -5. 

Likelihood = -4. 

Likelihood = -4. 

Likelihood = -3 

Likelihood = -3 

Likelihood = -1. 

Likelihood = -0 



Transmembrane 
Transmembrane 
Transmembrane 



Transmembrane 



- 35 ( 15 - 38) 

- 242 ( 222 - 244) 

- 146 ( 126 - 149) 

- 445 ( 422 - 446) 

- 71 { 53 - 74) 

- 358 ( 340 - 359) 

- 105 { 89 - 106) 

- 202 { 186 - 202) 



Final Results 

bacterial membrane --- Certainty=0. 3654 (Affirmative) < succ; 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB15902 GB:Z99123 cytochrome bd ubiquinol oxidase (subunit I) 
[Bacillus subtilis] 

Identities = 246/470 (52%), Positives = 319/470 (67%), Gaps = 12/470 (2%) 



Query: 6 LARFQFAMTTVFHFFFVPFTIGTCLWAIMET^IYVITKNEEYKKLTKFWGNIMLLSFAVG 65 

LAR QFA TT+FHF FVP +IG +VA+MET+Y++ KNE Y K+ KFWG++ L++FAVG 
Sbjct: 6 IARIQFASTTLFHFLFVPMSIGLVFWALKETLYLVKKNELYLKMAKFWGHLFLINFAVG 65 

Query: 66 WTGIIQEFQFGMNWSDYSRFVGDIFGAPLAIEALLAFFMESTFLGLWMFTWDNECKISKK 125 

WTGI+QEFQFG+NWSDYSRFVGD+FGAPLAIEALLAFFMES F+GLW+F WD ++ KK 
Sbjct: 66 VVTGILQEFQFGLNWSDYSRFVGDVFGAPLAIEALLAFFMESIFIGLWIFGWD--RLPKK 123 

Query: 126 LHVTFIWLWFGSLMSA^WILTANSFMQHPVGYE^VNGRAOMTDFLALv^CNPQFFYEFTH 185 

+H IWLV FG++MS+ WILTANSFMQ PVG+ + NGRA+M DF AL+ NPQ + EF H 
Sbjct: 124 IHALCIWLVSFGTIMSSFWILTAl'ISFMQEPVGFTIKNGRAEMNDFGALITNPQLWVEFPH 183 



Query: 186 VIFGAITMGGTWAGMSAFRLLKSEQLKDTTVELYKKSVRIGLWALLGSISVMGVGDLQ 245 
60 VIFGA+ G +AG+SAF+LLK ++ V +K+S ++ ++V L + V G +Q 
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Sbjct: 184 VIFGALATGAFFIAGVSAFKLLKKKE VPFFKQSFKLAMIVGLCAGLGVGLSGHMQ 238 

Query. 246 MK2^IHDQPMKFAAMEGDYEDSGDPAAWSVVAWANEAEHKQVFGIKIPYMIjSILSyGKPS 305 
+ L+ QPMK AA EG +EDSGDPAAW+ A + K IK+PY LS L+Y K S 
5 Sbjct: 239 AEHLMESQPMKMAASEGLWEDSGDPAAVJTAFATIDTKNEKSSNEIKVPYALSYIAYQKFS 298 

Query: 30S GSVKGMDTANJCELVAKYGKDNYYPMVNLLFYGFRTMAAMGTAIMGVSVLGLFLTRKKKPI 365 

GSVKGM T E YGK +Y P V F4 FR M G ++ ++ GL+L R+KK 
Sbjct: 299 GSVKGMKTLQAEYEKIYGKGDYIPPVKTTFWSFRINWGAGVVMI1AALGGLWLNRRKK- - 356 

10 

Query: 366 LYKHKWMLWIVALTTFAPFLANTFGWIVTTEQGRYPWTVYGLFKIKDSVSPNVSVASLFVS 425 

L ML 1+ PFIAN+ GWI+TE GR PWTV GIi SVSPNV+ SL S 

Sbjct: 357 LENSKWYLRIMIALISFPFLANSAGWIMTEIGRQPWTVKGLMTTAQSVSPNVTAGSLLFS 416 

15 Query: 426 NTWFLLFGGIAVMMISLTIRELKKGPEYEDERGHHGAYTSIDPFEEGAY 475 
+ +++ L +++ L IRE+KKG E+++ HH S DPF + Y 
Sbjct: 417 IIAFGVMYMILGALLVFLFIREIKKGAEHDN HHDVPVSTDPFSQEVY 463 

No corresponding DNA sequence was identified in S.pyogenes. 
20 Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 



Example 1612 

A DNA sequence (GBSxl707) was identified in S.agalactiae <SEQ ID 4975> which encodes the amino 
acid sequence <SEQ ID 4976>. This protein is predicted to be cytochrome oxidase subunit II (cydB-1). 
25 Analysis of this protein sequence reveals the following: 



Possible site: 22 



> Seems to have an unaleavable N 


term signal seq 










INTEGRAL 


Likelihood = 
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Transmembrane 
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( 220 


250) 


INTEGRAL 


Likelihood = 


-8.12 


Transmembrane 
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INTEGRAL 


Likelihood = 
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Transmembrane 


1S8 
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218) 


INTEGRAL 


Likelihood = 


-6.95 


Transmembrane 


85 


101 


( 76 


103) 


INTEGRAL 


Likelihood = 


-6.74 


Transmembrane 


6 


22 


( 1 


27) 


INTEGRAL 


Likelihood = 


-6.16 


Transmembrane 


300 


316 


( 298 


322) 


INTEGRAL 


Likelihood = 


-5.36 


Transmembrane 


119 


135 


( 117 


- 143) 


INTEGRAL 


Likelihood = 


-4.04 


Transmembrane 


159 


175 


( 155 


- 178) 



le --- Certainty=0. 6795 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm ---- Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB15901 GB : Z99123 cytochrome bd ubiquinol oxidase (subunit II) 
[Bacillus subtilis] 
Identities = 15B/331 (47%) , Positives = 223/331 (66%) . Gaps = 1/331 (0%) 

Query: 1 MSALQFFWFFLIGLLFSGFFFLEGFDFGVG14.WQTLTHNEHSKDQVWTIGPVWDGNEVW 60 

M++L WF L+ +LF GFFFLEGFDFGVGMA + L HNE E+ ++ TIGP WD NEW 
Sbjct: 1 MASLHDLWFILVAVLFVGFFFLEGB1)FGVG^TRFIX3HNELERRVLINTIGPFWDANEVW 60 

Query: 61 LLTGGGAMFASFPYWYASLFSGYYLILLTILFGLIIRGVSFEFRHKVPAEK-KQFWNWTL 119 

LLTG GA+FA+FP WYA++ SGYY+ +■ -t-L. L+ RGV+FEFR KV K + W+W + 
Sbjct: 61 LLTGAGAIFAAFPNWYATMLSGYYIPFVIVIjIjAIiMGRGVAFEFRGKVDHLKWVKVWDWVV 120 

Query: 120 TIGSAIVPFFFGIMFISLICGMPLDASGNl^SAQFSDYFNIFSLVGGVA^lVLLAYLHGLNY 179 

GS I PF G++F +L +GMP+DA N+ A SDY N++S++GGV + LL + HGL + 
Sbjct: 121 FFGSLI PPFVLGVLFTTLFRGMP IDADMNIHAHVSDYIKVYS ILGG VTVTLLCFQHGLMF 180 

Query: 180 IALKTEGPIRERARNYAQLLYWVLYI^IJALFAVIJLYFKTDFFSNHPIVTTI^WLVIVVLA 239 
I L+T G ++ RAR AQ + V+++ + FA L ++TD F+ +T + ++IV+ 
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Sbjct: 181 ITLRTIGDLQM^KmQKIMGVVFVAVIAEaftLSAYQTDMFTRRGEITIPIiA.VLIVICF 240 

Query: 240 VLAHASTFKGAEMTAFIASGLSLVSVWLLFQGLFPRVMISSISPIWDLLIQNASSTPYT 299 

+LA K + F +G L V ++F LFPRVM+SS+ YDL + NASS Y+ 

Sbjct: 241 MIAAVFIRKKKDGM'FG^GAGLALWGMIFISLFPRVMVSSLHSAYDLTVANASSGDYS 300 

Query: 3 00 LKVMSIVAITLVPFVIAYTAWAYYI FRKRIT 330 

LKVMSI A+TL+PFV+ W+YY+FRKR++ 
Sbjct: 3 01 LKVMSIAALTLLPFVIGSQIWSYYVFRKRVS 331 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1613 

A DNA sequence (GBSxl708) was identified in S.agalactiae <SEQ ID 4977> which encodes the amino 
acid sequence <SEQ ID 4978>. Analysis of this protein sequence reveals the following: 

i uncleavable N-term signal seq 
- Final Results 



bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 



Example 1614 

A DNA sequence (GBSxl709) was identified in S.agalactiae <SEQ ID 4979> which encodes the amino 
acid sequence <SEQ ID 4980>. This protein is predicted to be transport ATP-binding protein cydc (cydD). 
Analysis of this protein sequence reveals the following: 

Possible site: 27 

»> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood =-16.82 Transmembrane 158 - 174 ( 144 - 182) 

INTEGRAL Likelihood = -6.48 Transmembrane 15 - 31 ( 14 - 34) 

INTEGRAL Likelihood = -5.31 Transmembrane 243 - 259 ( 238 - 266) 

INTEGRAL Likelihood = -2.55 Transmembrane 136 - 152 ( 134 - 152) 

INTEGRAL Likelihood = -0.48 Transmembrane 263 - 279 ( 263 - 279) 

Final Results 

bacterial membrane Certainty=0. 7729 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB15900 GB:Z99123 ABC membrane transporter (ATP-binding 
protein) [Bacillus subtilis] 
Identities = 279/569 (49%) , Positives = 401/569 (70%) , Gaps = 6/569 (1%) 

Query: 2 LDKAVMRLSGIHKLLGLIAGLDVLQAIFIIGQATnjSLSITGLlffiGQKlSSQTVYILLFM 61 

+ K + R G+ ++L L+ L ++Q II QA +LS ++TGL+ G+ ++S I F+ 
Sbjct: 1 MGKDLFRYKGMKRILTLITCXTLIQTAailMQAEWLSEAVTGLFNGKGITSLDPVIGFFL 60 
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Query: 62 VSYLGRHVIDYIKNRKLDDFSTAQSSLLRRQLLDKLFDLGPKVVQEQGTGNVVTMALDGV 121 

++++ RH + + + + ++ + LR+ LD+LF LGP+ +++GTG +VT+A+4G+ 
Sbjct: 61 IAFIARHGMTVARQKIVYQYAART3ADLRKSFLDQLFRLGPRFAKICEGTGQMVTLAMEGI 120 

Query: 122 SLVENYLRLVLNICMINMSIIPWIILAYIFYLDIESGAILLIVFPLIIIFMIILGYAAQAK 181 

S YL L L KM++M+I+P ++ Y+F+ D S IL+ P++IIFMI+LG AQ K 
Sbjct: 121 SQFRRYLELFLPKMVSMAIVPAAWIYVFFQDRTSAIILVAAMPILIIFMILLGLVAQRK 180 



Query: 242 FALDFFTTLSIAIVAVFLGLRLLNEQIYLLPALT1LILSPEYFLPVRDFSSDYHATLDGK 301 

FALDFFT LS+A VAVFLGLRL++ I L PALT LIL+PEYFLPVR+ +DYHATL+G+ 
Sbjct: 241 FALDFFTMLSVATVAVFLGLRLIDGDILLGPALTALILAPEYFLPVREVGKDYHATLNGQ 300 

Query: 302 NAFQAI QKVLNKTG I KGE - QLVI DDWS KE SRLDLENI AI AYDQKRWEDVTLRFRGHQKV 360 

A + IQ++L++ GKE L++WS+ L L +++ R V D+ L F+G +K+ 

Sbjct: 301 EAGKTIQEILSQPGFKEETPLQLEAWSDQDELXLSGVSVG RSVSDIHLSFKGKKKI 356 

Query: 361 ALVGVSGSGKSSLINLLSGFLGPDNGSLKTOGREVTNLDQEDWHKQMIYIPQTPYVFEMS 420 

++G SG+GKS+LI++L GFL PD G ++V+G ++L W K ++YIPQ PY+F+ + 
Sbjct: 357 GIIGASGAGKSTLIDILGGFLEPDGGMIEWGTSRSHLQDGSWQKNLLYIPQHPYIFDDT 416 

Query: 421 LRDNITFYTPNASDEEVVRAIHMVGLDSLLSELPDGLETRIGNGARPLSGGQAQRIALAR 480 

L +NI FY P+AS E+ RA GL L++ LPDGLE RIG G R LSGGQAQR+ALAR 
Sbjct: 417 LGNNIRFYHPSASAEDTTRAAASAGLTELVTMNLPDGLEGRIGEGGRALSGGQAQRVALAR 476 

Query: 481 AFLDQNRRIMVFDEPTAHLDIETELELKEKMLPLMEDRLVIFATHRLHWLNQMDVIWME 540 

AFL NR I++ DEPTAHLDIETE E+KE ML L D+LV ATHRLHW+ MD I+V+ + 
Sbjct: 477 AFLG-I^PILLLDEPTAHLDIETEYEIKETMLDIjFEDKLVFLATHRLHWMLDMDEIIVIiD 535 

Query: 541 KGRVAEVGSYQELLAKKGYLYQLKHAMGG 569 

GRVAE+G++ ELL KG +L A G 
Sbjct: 536 GGRVAEIGTHKELLEKNGVYTKLVKAQLG 564 

A related DNA sequence was identified in S.pyogenes <SEQ ID 498 1> which encodes the amino acid 
sequence <SEQ ID 4982>. Analysis of this protein sequence reveals the following: 

Possible site: 53 
>» Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood =-10.61 Transmembrane 159 - 175 ( 154 - 190) 
INTEGRAL Likelihood =-10.03 Transmembrane 
INTEGRAL Likelihood = -3.03 Transmembrane 
INTEGRAL Likelihood = -1.44 Transmembrane 

Final Results 

bacterial membrane Certainty=0. 5246 (Affirmative) 

bacterial outside Certainty=0 . 0000 (Not Clear) < 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < 

The protein has homology with the following sequences in the databases: 

>GP:AAC22320 GB:U32749 ATP-binding transport protein (cydD) 



Query: 46 MISFYLIAKTFSTFILGHAIALGRLAGLLLLLNWGFVLAILGK QLQGIASQFARDS 102 

+ S+ L A F L A+ LG + L L A GK Q AS + 

Sbjct: 17 VFSYILQAAYFHELSLLSAVILGIVLIAALALR AFAGKKSVQASYFASTKVKHE 70 

Query: 103 LKQSFFEAFIDLDGQFDAHASDADILTLASQGIDSLDTYYGYYL-SLSMRTKWNCTTIMI 161 

L+ + + S + 1+ +AS+G++ L+ Y+G YL L T 

Sbjct: 71 LRSLIYRKLASMPLNQVNQQSTSSIIQVASEGVEQLEIYFGRYLPQLFYSLLAPLTLFAF 130 



65 Query: 162 LVFLIYPLAGLVFLGVLPLIPLSIVAMQKRSQPNMSHYWSSYMDVGNLFMDDLKGLNTLY 221 
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L+F + A ++ L +PLIP+SI+A+ K ++ ++ YWS Y+ +G+ F+D+L+GL TL 
Sbjct: 131 LIFFSFKTA-IILLICVPLIPMSIIAWKIAKKLLAKYWS1YVGLGSSFLDNLQGLITLK 189 

Query: 222 SYQATERyEQEFSGKAEQFRKATMSLLGFQLQAVGYMDAVMYLGIGLSGFLAVQALATGQ 281 

YQ + +AE FRK TM +L QL +V MD + Y G + A+ Q 

Sbjct; 190 IYQDDAYKAKAMDKEAEHFRICITMIWLTMQLNSVSLMDLLAYGGAAIGILTALLQFQNAQ 249 

Query: 282 LSFFNFLFFLLIATEFFTPIREQGYGMHLVMMNTKMADRIFSFLDS-VPARKDNKSKTAI 340 

LS + F+L+++EFF P+R G H+ M +D+IF+ LD+ V ++ A 

Sbjct: 250 LSVLGVILFILIiSSEFFIPLRLLGSFFHVAMNGKAASDKIFTLLDTPVETQQSAVDFEAK 309 

Query: 341 NFNQIDIQNISLAY-EKKTVLSGVTMTLTKGQLTAIAGVSGQGKTSLAQLLLKRQSATTG 399 

N Q++I+++ +Y E+K ++G+ +++ QL+ G SG GK++L LL+ A G 
Sbjct: 310 NNVQVEIKDLHFSYSEEKPAITGLNLSILPNQLSVFVGKSGCGKSTLVSLLMGFNKAQQG 369 

Query: 400 HILFDGLDSDNLSQETINQQVLYVSDQSTLLNRSIYDNLRLA-ANLSKKEILDWIDQHGL 458 

ILF+G ++ N+ + + Q+V VS S + ++ +N+ +A + + ++I ++Q L 
Sbjct: 370 EILFNGQNALNIDRTSFYQKVSLVSHSSYVFXGTLRENMTMAKIDATDEQiyACLKQVNL 429 

Query: 459 LSFINWLPDGLDTIVGENGNLLSPGQKQQVICARALLSKRSLYIFDEATSSLDAENERII 518 

F+ GLD + G LS GQ Q++ ARALL LYIFDEATS++D E+E II 

Sbjct: 430 AQFW-DNGGLDMQLLSRGANLSGGQIQRIALARALLHNAELYIFDEATSNIDVESEEII 488 

Query: 519 DNLITRIAKTAIVIVITHKMSRLKGANQVLFLNTGQPACLGKPCDLYRDQPTYRHLVDTQ 578 

I + + +++I+H+++ A+ + L+ G+ G +L Q Y + Q 
Sbjct: 489 LQFIQQFKQQKTIWISHRLA1©.VT<IADCIWLDQGKLIEQGTHKELMEKQGAYAEMFQQQ 548 

Query: 579 ARLE 582 
LE 

Sbjct: 549 KDLE 552 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 143/552 (25%) , Positives = 260/552 (46!;) , Gaps = 12/552 (2%) 

Query: 1 MLDKAVMRLSGIHKLLGLLAGLDVLQAIFIIGQAYYLSLSITGLVffiGQKLSSQTVYILLF 60 

+L + R++ LL + A L LQ + + Y ++ + + G ++ + LL 
Sbjct: 16 LLKRLRERIAPKRYLLYVSAFLSWLQFVMRMISFYLIAKTFSTFILGHAIALGRLAGLLL 75 

Query: 61 WSYLGRHVIDYIKNRKLDDFSTAQSSLIjRRQLLDKLFDLGPKWQEQGTGNWTMALDG 120 

+++ +G V+ + + S L++ + DL + +++T+A G 

Sbjct: 76 LLNWG-FVLAILGKQLQGIASQFARDSLKQSFFEAFIDLDGQFDAHASDADILTLASQG 134 

Query: 121 VSLVENYLRLVLNKMINMS I IPWI ILAYI FYLDIESGAILLI VFPLI 1 1 FMIILGYAAQA 180 

+ ++ y L+ + 1+ +F + +G + L V PLI + ++ + +Q 

Sbjct: 135 IDSIilTYYGYYLSLSMRTKWWCTTIMILVFLIYPLAGLVFLGVLPLIPLSIVAMQKRSQP 194 

Query: 181 KADKQYESYQVLSNHFLDSLRGIDTLKYFGLSKRYGKSIYQTSESFRKATMSTLKIGILS 240 

+ SY + N F+D L+G++TL + ++RY + +E FRKATMS L + + 

Sbjct: 195 NMSHYWSSYMDVGNLFMDDLKGLNTLYSYQATERYEQEFSGKAEQFRKATMSLLGFQLQA 254 

Query: 241 TFALDFFTTLSIAIVAVFLGLRLLTCEQIYLLPALTILILSPEYFLPVRDFSSDYHATLDG 300 

+D L I + L Q+ L B+++ E+F P+R+ H + 

Sbjct: 255 VGYMDAVMYLGIGLSGFLAVQALATGQLSFFKFLFFLLIATEFFTPIREQGYGMHLVMMN 314 

Query: 301 KNAFQAIQKVLNKTGIKGEQLVIDDWSKE SRLDLENIAIAYDQKRWEDVTLRFRG 356 

I L+ + D-l- SK +++D++NI++AY++K V+ VT4 

Sbjct: 315 TKMADRIFSFLDSVPARK DNKSKTAINFNQIDIQNISLAYEKKTVLSGVTMTLTK 369 

Query: 357 HQKVALVGVSGSGKSSLINLLSGFIfiPDNGSLKvroREVrKLDQEDWHKQMIYIPQTPYV 415 

Q A+ GVSG GK+SL LL G + DG + NL QE ++Q++Y+ + 

Sbjct: 370 GQLTAIAGVSGQGKTSLAQLLLKRQSATTGHILFDGLDSDNLSQETINQQVLYVSDQSTL 429 

Query: 417 FEMSLRDNITFYTPNASDEEWRAIHMVGLDSLLSELPDGLETRIGNGARPLSGGQAQRI 476 

S+ DN+ N S +E++ I GL S ++ LPDGL+T +G LS GQ Q++ 

Sbjct: 430 IJSIRSIYDNLRL-AANLSKKEILDWIDQHGLLSFII«'JLPDGLDTIVGENGNLLSPGQKQQV 488 

Query: 477 ALARAFLDQNRRIMVFDEPTAHLDIETELELKEKMLPLMSDRLVIFATHRLHWLNQMDVI 536 
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ARA L + R + +FDE T+ LD EE + + L +VI TH++ L + + 
Sbjct: 489 ICARALLSK-RSLYIFDEATSSLDAENERIIDNLITRLAKTAIVIVITHKMSRLKGaNQV 547 

Query: 537 WMEKGRVAEVG 548 

+ + G+ A +G 
Sbjct: 548 LFLNTGQPACLG 559 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1615 

A DNA sequence (GBSxl710) was identified in S.agalactiae <SEQ ID 4983> which encodes the amino 
acid sequence <SEQ ID 4984>. This protein is predicted to be transport ATP-binding protein cydd (cydC). 
Analysis of this protein sequence reveals the following: 

Possible £ 



i have no N-terminal signal sequence 

INTEGRAL Likelihood 

INTEGRAL Likelihood 

INTEGRAL Likelihood 

INTEGRAL Likelihood 

INTEGRAL Likelihood 

INTEGRAL Likelihood = -1. 

INTEGRAL Likelihood = -0. 



Transmembrane 250 - 27S ( 258 - 284 

Transmembrane 172 - 188 ( 147 - 199: 

Transmembrane 150 - 166 ( 147 - 171! 

- 47 ( 29 - 

- 84 ( 67 - 
17 Transmembrane 293 - 309 ( 292 - 
69 Transmembrane 494 - 510 ( 493 - 



Final Results 

bacterial membrane --- Certainty=0 . 6137 (Affirmative) < suco 
bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 
bacterial cytoplasm --- Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 10127> which encodes amino acid sequence <SEQ ID 
10128> was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB15899 GB:Z99123 ABC membrane transporter (ATP-binding 
protein) [Bacillus subtilis] 
Identities = 262/573 (45%) , Positives = 389/573 (67%) , Gaps = 14/573 (2%) 

Query: 16 LKTDQWIKPFFKQYKVSLVIALFLGFMTFFSASALMFNSGYLISKSASLPSNILLVYVPI 75 

+K ++WI P+ KQ V+ +FLG +T FSA+ LMF SG+LISK+A+ P NILL+YVPI 

Sbjct: 1 MKKEEWILPYIKQNARLFVLVIFLGAVTIFSAAFLMFTSGFLISKAATRPENILLIYVPI 60 

Query: 76 VLTRAFGIGRPVFRYIERLTSHNWVLRMTSQLRLKLYHSLESNAIFMKRDFRLGDVMGLL 135 

V R FGI R V RY+ERL H+ +L++ S +R++LY+ LE A+ ++ FR GD++G+L 
Sbjct: 61 VAVRTFGIARSVSRYVERLVGHHIILKIVSDMRWLYNMLEPGALMLRSRFRTGDMLGIL 120 

Query: 136 AEDINYLQNLYLRTIFPTIIAWILYSFIIIATGFFSLWFALMMLLYLAIMIFLFPLWSIL 195 

+EDI +LQ+ +L4TIFP I A +LY+ +IA GFFS FA+++ LYL +++ LFP+ S+L 
Sbjct: 121 SEDIEHLQDAFLKTIFPAISALLLYAVSVIALGFFSWPFAILLALYLFVLWLFPWSLL 180 

Query: 196 ANGARQTREKELKNHLYTDLTDNVLGISDWrFSQRGQEYVALHERSESELMAVQKKIRSF 255 

A+ + K +N LY+ LTD V+G+SDW+FS R ++ +E+ E + +++K + F 
Sbjct: 181 VTRAKNAKLKBGPJST\TJYSRLTDAWiGVSDVJMFSGRRHJi.FIDAYEKEERDWFELERKKQRF 240 

Query: 256 DNRRALI VELVFGFLAILVI IWASNQFIGHRGGEA- -NWIAAFVLTVFPLSEAFAGLSAA 313 

R + + L +L++ W + Q GE IAAFVL VFPL+EAF LS A 

Sbjct: 241 TRWRDFAAQCLVAGLILLMLFrWAGQ QADGSLAKTMIAAFVLWFPLTEAFLPLSDA 297 

Query: 314 AQETNKYSDSIHRLN ELSETYFETTQNQLPNKPYDFSVKNLSFQYKPQEKWVLH 367 

E Y DSI R+N E S+T E+ L + + ++++F Y + VLH 

Sbjct: 298 LGEVPGYQDSIRRMNNVAPQPEASQT--ESGDQILDLQDVTLAFRDVTFSYDNSSQ-VLH 354 



